Skip to content
This repository was archived by the owner on Aug 11, 2020. It is now read-only.

v0.0.10 release #5

Merged
merged 24 commits into from
Feb 13, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
c84627d
tweak: Add support for the "Cancelled" job state.
colin-welch Jan 31, 2018
183a511
update Pipfile Pipfile.lock for build env
sanfilip Feb 8, 2018
58b5dd5
add set_apikey function
sanfilip Feb 9, 2018
a48ca80
add run and apikey commands
sanfilip Feb 9, 2018
fc977ae
fix run imports handling; add run entrypoint support; fix params muta…
sanfilip Feb 9, 2018
9c2f9a5
update version to 0.0.10
sanfilip Feb 9, 2018
56cd2a0
handle script name is first arg to run func
sanfilip Feb 10, 2018
bf75a33
change run options required fields; add no_logging and ignoreFiles ru…
sanfilip Feb 11, 2018
c9c6345
change zip_to_tmp to take a list of files, skip a list of ignore_file…
sanfilip Feb 11, 2018
4038a5e
add file existence checking for python script; simplify pipenv option
sanfilip Feb 11, 2018
f5e8358
add python run as module option; add support for python script args; …
sanfilip Feb 12, 2018
df1fb64
update test files; add myscript.py sample
sanfilip Feb 12, 2018
b271765
rename test directory to tests
sanfilip Feb 12, 2018
8093f9b
update README
sanfilip Feb 12, 2018
054077b
update README
sanfilip Feb 12, 2018
d07355c
update README with Dockerfile link for default container build
sanfilip Feb 12, 2018
404b988
reformat doc string
sanfilip Feb 12, 2018
9f387b2
reformat README
sanfilip Feb 12, 2018
919dda8
add -c option, - option, allow arbitrary args by terminating option p…
sanfilip Feb 13, 2018
a9a7891
fix pipenv scripting; fix -m support; add -c support; fix --command s…
sanfilip Feb 13, 2018
4e842cb
update README
sanfilip Feb 13, 2018
9b232ef
add tensorflow test script
sanfilip Feb 13, 2018
a566800
fix test.py
sanfilip Feb 13, 2018
eb59b9a
update Pipfile.lock
sanfilip Feb 13, 2018
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions Pipfile
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,10 @@ verify_ssl = true
[packages]

"e1839a8" = {path = ".", editable = true}
requests = {extras = ["security"]}
"boto3" = "*"
botocore = "*"
six = "*"


[dev-packages]
Expand Down
14 changes: 7 additions & 7 deletions Pipfile.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

222 changes: 207 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
Paperspace Python
=================

Sample usage
============
Getting Started
===============
1. Make sure you have a Paperspace account set up. Go to http://paperspace.com
to register.

Expand All @@ -29,41 +29,229 @@ Sample usage
Note: your api key is cached in ~/.paperspace/config.json
You can remove your cached api key by executing:

`paperspace-logout logout`
`paperspace-python logout`

5. Execute the sample Python script hello.py:
5. Run the sample script hello.py using Python:

`python hello.py`

The script will be run on the Paperspace job cluster node, and its output will be
logged locally.
The source of this sample script shows how a script can automatically run itself on the Paperspace job cluster node:

```
import paperspace

A slightly more complex example
===============================
# test/test_remote.py
paperspace.run()

import os
print('hello paperspace-python!')
```

Note: the source is modified before transfer to the job cluster in order to remove imported `paperspace` references.

6. Use paperspace-python to run a python script remotely:

`paperspace-python run myscript.py`

The script will be run on the Paperspace job cluster node, and its output will be logged locally.


Specifying jobs options within a script
=======================================
This example shows how a script can specify paperspace jobs options for itself, such as `project` name, `machineType`, and a `container` reference:

# tests/test_remote.py - runs itself on paperspace, demonstrates setting jobs create options
import os
import paperspace

paperspace.run({'project': 'myproject', 'machineType': 'GPU+', 'container': 'Test-Container'})
paperspace.run({'project': 'myproject', 'machineType': 'P5000',
'container': 'paperspace/tensorflow-python'})

print(os.getcwd())

print('something useful')


Automatic running of a python script remotely
=============================================
The above example demonstrates running a python script locally and having that script transmit itself to the paperspace jobs cluster for further execution. To do this a copy of the local script is modified before transmission to the jobs cluster, in order to strip out the `import paperspace` statements and other `paperspace` library references. There are also some limitations on the types of import statements that are supported, and the dependencies that are supported in each environment (local vs. remote):

1. You need to use a bare import statement, `import paperspace`, and not use the `import paperspace as ...` form.
2. The import form `from paperspace import ...` is currently not supported.
3. Everything after the `paperspace.run()` function call is ignored when running locally (when no script name is provided). The local script execution stops after the `paperspace.run()` call.
4. Dependencies that are included before `paperspace.run()` must be available locally.
5. If you need to reference dependencies that are not available locally but are available remotely, those should be imported after the `paperspace.run()` call.
6. Dependencies that are needed remotely need to either be already installed in the container used for the job, or need to be installed using one of the techniques below in the section [Dependency Options](#dependency-options)

Because of these limitations it may not always be appropriate to run python scripts automatically from within the same script file. As an alternative you can run your python scripts unmodified using the techniques below.


Running a python script by name
===============================
You can run an python script on paperspace from the command line as follows:

paperspace-python run myscript.py

You can also provide additional jobs options on the command line:

paperspace-python run myscript.py --project myproject --machineType P5000 \
--container paperspace/tensorflow-python`

Alternatively you can use the `paperspace.run()` fuction in code with, a script file name as the first argument:

import paperspace

paperspace.run('myscript.py') # runs myscript on paperspace

In code you can provide additional paperspace jobs create options in a dict in the second argument to run():

paperspace.run('myscript.py', {'project': 'myproject', 'machineType': 'P5000',
'container': 'paperspace/tensorflow-python'})

See the Paperspace API [jobs create](https://paperspace.github.io/paperspace-node/jobs.html#.create) documentation for the full list of jobs create options that can be specified.


Using paperspace-python run
===========================
The `paperspace-python run` command provides a number of options to run python code and other commands remotely, as well as copy files and set up python dependencies:

paperspace-python run [options] [[-m] <script> [args] | -c "python code" | --command "shell cmd"]
options:
[--python 2|3]
[--init [<init.sh>]]
[--pipenv]
[--req [<requirements.txt>]]
[--workspace .|<workspace_path>]
[--ignoreFiles "<file-or-dir>,..."]
[jobs create options]
[--dryrun]
[-]

Basic Run Scenarios
===================
1. Run a python script remotely:

`paperspace-python run <python_script.py> [args]`

Example:

`paperspace-python run myscript.py a b c`

2. Run a python module remotely using the `-m` option:

`paperspace-python run -m <module_path> [args]`

Example:

`paperspace-python run -m pip --version`

3. Run a python command remotely using the `-c` option:

`paperspace-python run -c "python_statement;..."`

Example:

`paperspace-python run -c "import os; print(os.getcwd())"`

4. Run an executable or shell command remotely using the `--command` option:

`paperspace-python run --command "<executable or shell command>"`

Example:

`paperspace-python run --command "ls -al"`

Run Options
===========
The `<script>` option is a python script or path to a python module. The script or module will be uploaded if it exists on the local file system.

Other script`args` can be provided after the python script or module path. You can use the `-` option to suppress interpretation of the list of script args as `paperspace-python run` options.

The `-m <module path>` option runs the specified library module as a script. This is equivalent to the `-m` option of the `python` executable. Further paperspace run option processing is disabled after the `-m` option.

The `-c "python_statement;..."` option runs the specified python statements. This is equivalent to the `-c` option of the `python` executable. Further paperspace run option processing is disabled after the `-c` option.

The `-` option disables further run command option processing and passes the remaining arguments to the script specified. This allows you to pass arguments to your script that might otherwise conflict with run command options or jobs create options.

The `--command "shell cmd"` option is used to run an arbitrary executable or shell command inside the container. Note: the executable or shell command must already be available inside the container image, or be copied over using the `--workspace` option.

Job Options
===========
The `--workspace` option allows you to specify a workspace file or directory to upload, or a git repo link to download and merge with the container. For example, to upload the current directory along with a script file run:

paperspace-python run myscript.py --workspace .

See the Paperspae API [jobs create](https://paperspace.github.io/paperspace-node/jobs.html#.create) documentation for more details on the `--workspace` option and related options.

The `--ignoreFiles "<file-or-dir>,..."` option can be used specify a simple comma separated list of files and directories to ignore for the workspace upload:

paperspace-python run myscript.py --workspace . --ignoreFiles "hello.py,paperspace"

The following files and directories are ignored by default: `.git`, `.gitignore`, `__pycache__`.

Other `jobs create options` can be specified, such as `--machineType <machine type>`, `--container <container image reference>`, and `--project <project name>`.

Here are some of the other jobs create options available:

- `--project "<project name>"` (defaults to 'paperspace-python')
- `--machineType [GPU+|P4000|P5000|P6000|V100]` (defaults to P5000)
- `--container <docker image link or paperspace container name>` (defaults to `docker.io/paperspace/tensorflow-python`)
- `--name "<job name>"` (defaults to 'job for project <project name>')
- `--projectId "<existing paperspace project id>"`
- `--registryUsername "<username>"` (for access to a private docker registry)
- `--registryPassword "<secretpw>"` (for access to a private docker registry)
- `--workspaceUsername "<username>"` (for access to a private git repo)
- `--workspacePassword "<secretpw>"` (for access to a private git repo)

See the Paperspae API [jobs create](https://paperspace.github.io/paperspace-node/jobs.html#.create) documentation for a complete description of these options.

Dependency Options
==================
When running python scripts on paperspace you may need to provide additional dependencies to your scripts or specify the python version.
The `paperspace-python run` command has several options to support this: `--python`, `--init`, `--pipenv`, and `--req`. In addition you can use the `--workspace` option above to upload file dependencies.

The `--python 2|3` option allows you specify whether to use `python2` or `python3` when running the script on paperspace.
If ommitted, the script will be run with the same major version as is being used to run `paperspace-python` locally.

The `--init [<init.sh>]` option is used to specify a script to be run on the remote machine, inside the container, before the python script is run.
If the init script name is ommitted, it is assumed to be the script named `init.sh` in the current directory. The script is run using
`source init.sh` in the container bash shell. You can use this option to provide a list of commands to run to set up the dependencies for the script, such as running a list of `pip install` commands. However, if you are using `pipenv` or a `requirements.txt` file we recommend you use one of the options below. Multiple dependency setup options can be combinded however.

The `--pipenv` option is used to upload and run the `Pipfile` and `Pipfile.lock` files in the current directory, if found. These files
are used on the paperspace machine to initialize the python environment using the `pipenv` tool, by running `pipenv install` within the container. Note: `pipenv` must already be installed in the container for this option to work. The default container used by paperspace-python already has the `pipenv` package installed.

The `--req [<requirements.txt>]` option is used specify that a `requirements.txt` file should be used to install the required python dependencies using `pip`.
By default this option looks for a file named `requirements.txt` in the current directory, but you can override this by specifying a different file name. Note: `pip` must already be installed in the container for this option to work. The default container used by paperspace-python already has the `pip` package installed.

The `--dryrun` option allows you to see the resultant script that will be run on the paperspace job runner without actually running it.

All of the above options can be combined in any combination, however, the order of operations is fixed to the following:

1. `source <init.sh>` is run if `--init <init.sh>` is specified
2. `pipenv [--two|--three] install` is run if `--pipenv` is specified
3. `pip[2|3] install -r requirements.txt` is run if `--req <requirements.txt>` is specified
4. `python[2|3] myscript.py` is run

As mentioned above, you can use the `--dryrun` option to see the resultant commands that will be run on the paperspace jobs cluster node for a given set of options, without actually running the commands.


Default Container
=================
If no `container` option is specified when using `paperspace run <script.py>` or the `paperspace.run()` function the default container image used is `paperspace/tensorflow-python` on Docker Hub. This container has the tensorflow-gpu libraries installed for both python2 and python3, as well as several other popular packages, including numpy, scipy, scikit-learn, pandas, Pillow and matplotlib.
It is based off the Google docker image `gcr.io/tensorflow/tensorflow:latest-gpu` with the addition of support for python3, pip3, and pipenv.

A Dockerfile for building this container image is [here](https://github.com/Paperspace/tensorflow-python/).

Other examples
==============
See the scripts in the test folder for other examples.
See the scripts in the `tests` folder for other examples.


Other Authentication options
============================
1. Specify your apiKey explicitly on any of the paperspace.jobs methods, e.g.:
1. Specify your apiKey explicitly on the paperspace.run() function or any of the paperspace.jobs methods, e.g.:

`paperspace.jobs.create({'apiKey': '1qks1hKsU7e1k...', 'project': 'myproject', 'machineType': 'GPU+', 'container': 'Test-Container'})`
```
paperspace.jobs.create({'apiKey': '1qks1hKsU7e1k...', 'project': 'myproject',
'machineType': 'P5000', 'container': 'paperspace/tensorflow-python'})
```

2. Set the package paperspace.config option in your python code:

Expand All @@ -78,6 +266,10 @@ Other Authentication options
Note: the above methods take precedence over use of the cached api key in
`~/.paperspace/config.json`

4. Set the key in the `~/.paperspace/config.json` file from the command line by running:

`paperspace-python apikey 1qks1hKsU7e1k...`


Contributing
============
Expand Down
16 changes: 16 additions & 0 deletions myscript.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
import os
import subprocess
import sys
args = sys.argv[:]
print('hello from %s' % args[0])
print('args: ' + ' '.join(args))
print('current directory: ' + os.getcwd())
p = subprocess.Popen('ls -al', shell=True, bufsize=1, universal_newlines=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
while True:
line = p.stdout.readline()
if line != '':
print(line.rstrip())
else:
break
retval = p.wait()
print('%s done' % args[0])
2 changes: 1 addition & 1 deletion paperspace/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,4 @@
from .jobs import print_json_pretty, run
from . import jobs

__version__ = "0.0.9"
__version__ = "0.0.10"
Loading