Add accelerator API to RPC distributed examples: ddp_rpc, parameter_server, rnn #1371

jafraustro · 2025-07-14T16:34:00Z

Add accelerator API to RPC distributed examples:

ddp_rpc
parameter_server
rnn

Signed-off-by: jafraustro <[email protected]>

- ddp_rpc - parameter_server - rnn Signed-off-by: jafraustro <[email protected]>

netlify · 2025-07-14T16:34:06Z

✅ Deploy Preview for pytorch-examples-preview canceled.

Name	Link
🔨 Latest commit	`a84f91c`
🔍 Latest deploy log	https://app.netlify.com/projects/pytorch-examples-preview/deploys/68798280b39c080008fc743c

soumith · 2025-07-15T01:20:25Z

failing CI

Signed-off-by: jafraustro <[email protected]>

jafraustro · 2025-07-15T15:55:02Z

I added numpy to requirement.txt files

soumith · 2025-07-16T15:34:24Z

still failing :D

- Added a function to verify minimum GPU count before execution. - Updated HybridModel initialization to use rank instead of device. - Ensured proper cleanup of the process group to avoid resource leaks. - Added exit message if insufficient GPUs are detected. Signed-off-by: jafraustro <[email protected]>

jafraustro · 2025-07-17T23:16:37Z

Hi @soumith,

DDP step needs two gpu's.

Fix:

Added verify_min_gpu_count() function to check for sufficient GPU resources.
Updated the HybridModel class to use rank-based device assignment instead of generic device handling, improving device placement consistency across distributed processes.
Implemented proper cleanup by adding dist.destroy_process_group() calls for trainer processes,

jafraustro added 2 commits July 14, 2025 09:29

Add rpc/ddp_rpc and rpc/rnn examples to CI

de7db4c

Signed-off-by: jafraustro <[email protected]>

Add accelerator API to RPC distributed examples:

9354458

- ddp_rpc - parameter_server - rnn Signed-off-by: jafraustro <[email protected]>

facebook-github-bot added the cla signed label Jul 14, 2025

jafraustro marked this pull request as ready for review July 14, 2025 16:34

Update requirements for RPC examples to include numpy

a790549

Signed-off-by: jafraustro <[email protected]>

jafraustro force-pushed the jafraust/rcp branch from d5697db to a790549 Compare July 15, 2025 15:53

jafraustro closed this Jul 15, 2025

jafraustro reopened this Jul 15, 2025

jafraustro force-pushed the jafraust/rcp branch from ff4b307 to a84f91c Compare July 17, 2025 23:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add accelerator API to RPC distributed examples: ddp_rpc, parameter_server, rnn #1371

Add accelerator API to RPC distributed examples: ddp_rpc, parameter_server, rnn #1371

jafraustro commented Jul 14, 2025

Uh oh!

netlify bot commented Jul 14, 2025 •

edited

Loading

Uh oh!

soumith commented Jul 15, 2025

Uh oh!

jafraustro commented Jul 15, 2025

Uh oh!

soumith commented Jul 16, 2025

Uh oh!

jafraustro commented Jul 17, 2025

Uh oh!

Uh oh!

Add accelerator API to RPC distributed examples: ddp_rpc, parameter_server, rnn #1371

Are you sure you want to change the base?

Add accelerator API to RPC distributed examples: ddp_rpc, parameter_server, rnn #1371

Conversation

jafraustro commented Jul 14, 2025

Uh oh!

netlify bot commented Jul 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for pytorch-examples-preview canceled.

Uh oh!

soumith commented Jul 15, 2025

Uh oh!

jafraustro commented Jul 15, 2025

Uh oh!

soumith commented Jul 16, 2025

Uh oh!

jafraustro commented Jul 17, 2025

Uh oh!

Uh oh!

netlify bot commented Jul 14, 2025 •

edited

Loading