Skip to content

Commit

Permalink
Fix example command lines in README to use --local_rank and `--dist…
Browse files Browse the repository at this point in the history
…ributed_backend` correctly (msr-fiddle#7)
  • Loading branch information
deepakn94 authored Sep 26, 2019
1 parent b80d16b commit 24acc3a
Show file tree
Hide file tree
Showing 2 changed files with 18 additions and 12 deletions.
11 changes: 7 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -107,14 +107,17 @@ python convert_graph_to_model.py -f vgg16_partitioned/gpus=4.txt -n VGG16Partiti
[from `pipedream/runtime/image_classification`; run on 4 GPUs (including a single server with 4 GPUs)]

```bash
python main_with_runtime.py --module models.vgg16.gpus=4 -b 64 --data_dir <path to ImageNet> --rank 0 --master_addr <master IP address> --config_path models/vgg16/gpus=4/hybrid_conf.json
python main_with_runtime.py --module models.vgg16.gpus=4 -b 64 --data_dir <path to ImageNet> --rank 1 --master_addr <master IP address> --config_path models/vgg16/gpus=4/hybrid_conf.json
python main_with_runtime.py --module models.vgg16.gpus=4 -b 64 --data_dir <path to ImageNet> --rank 2 --master_addr <master IP address> --config_path models/vgg16/gpus=4/hybrid_conf.json
python main_with_runtime.py --module models.vgg16.gpus=4 -b 64 --data_dir <path to ImageNet> --rank 3 --master_addr <master IP address> --config_path models/vgg16/gpus=4/hybrid_conf.json
python main_with_runtime.py --module models.vgg16.gpus=4 -b 64 --data_dir <path to ImageNet> --rank 0 --local_rank 0 --master_addr <master IP address> --config_path models/vgg16/gpus=4/hybrid_conf.json --distributed_backend gloo
python main_with_runtime.py --module models.vgg16.gpus=4 -b 64 --data_dir <path to ImageNet> --rank 1 --local_rank 1 --master_addr <master IP address> --config_path models/vgg16/gpus=4/hybrid_conf.json --distributed_backend gloo
python main_with_runtime.py --module models.vgg16.gpus=4 -b 64 --data_dir <path to ImageNet> --rank 2 --local_rank 2 --master_addr <master IP address> --config_path models/vgg16/gpus=4/hybrid_conf.json --distributed_backend gloo
python main_with_runtime.py --module models.vgg16.gpus=4 -b 64 --data_dir <path to ImageNet> --rank 3 --local_rank 3 --master_addr <master IP address> --config_path models/vgg16/gpus=4/hybrid_conf.json --distributed_backend gloo
```

`master IP address` here is the IP address of the rank 0 process. On a server with 4 GPUs, `localhost` can be specified.

When running DP setups, please use the `nccl` backend for optimal performance. When running hybrid setups, please use
the `gloo` backend.


## Code of Conduct

Expand Down
19 changes: 11 additions & 8 deletions runtime/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,27 +55,30 @@ important).
With input pipelining,

```bash
python main_with_runtime.py --module models.resnet50.gpus=2 -b 64 --data_dir ../../../data/imagenet --rank 0 --master_addr v11 --config_path models/resnet50/gpus=2/mp_conf.json
python main_with_runtime.py --module models.resnet50.gpus=2 -b 64 --data_dir ../../../data/imagenet --rank 1 --master_addr v11 --config_path models/resnet50/gpus=2/mp_conf.json
python main_with_runtime.py --module models.resnet50.gpus=2 -b 64 --data_dir ../../../data/imagenet --rank 0 --local_rank 0 --master_addr localhost --config_path models/resnet50/gpus=2/mp_conf.json --distributed_backend gloo
python main_with_runtime.py --module models.resnet50.gpus=2 -b 64 --data_dir ../../../data/imagenet --rank 1 --local_rank 1 --master_addr localhost --config_path models/resnet50/gpus=2/mp_conf.json --distributed_backend gloo
```

Without input pipelining,

```bash
python main_with_runtime.py --module models.resnet50.gpus=2 -b 64 --data_dir ../../../data/imagenet --rank 0 --master_addr v11 --config_path models/resnet50/gpus=2/mp_conf.json --no_input_pipelining
python main_with_runtime.py --module models.resnet50.gpus=2 -b 64 --data_dir ../../../data/imagenet --rank 1 --master_addr v11 --config_path models/resnet50/gpus=2/mp_conf.json --no_input_pipelining
python main_with_runtime.py --module models.resnet50.gpus=2 -b 64 --data_dir ../../../data/imagenet --rank 0 --local_rank 0 --master_addr localhost --config_path models/resnet50/gpus=2/mp_conf.json --no_input_pipelining --distributed_backend gloo
python main_with_runtime.py --module models.resnet50.gpus=2 -b 64 --data_dir ../../../data/imagenet --rank 1 --local_rank 1 --master_addr localhost --config_path models/resnet50/gpus=2/mp_conf.json --no_input_pipelining --distributed_backend gloo
```

With data parallelism (and no input pipelining),

```bash
python main_with_runtime.py --module models.resnet50.gpus=2 -b 128 --data_dir ../../../data/imagenet --rank 0 --master_addr v11 --config_path models/resnet50/gpus=2/dp_conf.json --no_input_pipelining
python main_with_runtime.py --module models.resnet50.gpus=2 -b 128 --data_dir ../../../data/imagenet --rank 1 --master_addr v11 --config_path models/resnet50/gpus=2/dp_conf.json --no_input_pipelining
python main_with_runtime.py --module models.resnet50.gpus=2 -b 128 --data_dir ../../../data/imagenet --rank 0 --local_rank 0 --master_addr localhost --config_path models/resnet50/gpus=2/dp_conf.json --no_input_pipelining --distributed_backend nccl
python main_with_runtime.py --module models.resnet50.gpus=2 -b 128 --data_dir ../../../data/imagenet --rank 1 --local_rank 1 --master_addr localhost --config_path models/resnet50/gpus=2/dp_conf.json --no_input_pipelining --distributed_backend nccl
```

Note that for DP-only setups, we use the `nccl` backend for optimal performance.


With hybrid parallelism (model and data parallelism, and pipelining),

```bash
python main_with_runtime.py --module models.resnet50.gpus=2 -b 64 --data_dir ../../../data/imagenet --rank 0 --master_addr v11 --config_path models/resnet50/gpus=2/hybrid_conf.json
python main_with_runtime.py --module models.resnet50.gpus=2 -b 64 --data_dir ../../../data/imagenet --rank 1 --master_addr v11 --config_path models/resnet50/gpus=2/hybrid_conf.json
python main_with_runtime.py --module models.resnet50.gpus=2 -b 64 --data_dir ../../../data/imagenet --rank 0 --local_rank 0 --master_addr localhost --config_path models/resnet50/gpus=2/hybrid_conf.json --distributed_backend gloo
python main_with_runtime.py --module models.resnet50.gpus=2 -b 64 --data_dir ../../../data/imagenet --rank 1 --local_rank 1 --master_addr localhost --config_path models/resnet50/gpus=2/hybrid_conf.json --distributed_backend gloo
```

0 comments on commit 24acc3a

Please sign in to comment.