[Question] FSDP+TP CUDA_DEVICE_MAX_CONNECTIONS

In Megatron repo https://github.com/NVIDIA/Megatron-LM/blob/4429e8ebe21fb011529d7401c370841ce530785a/megatron/training/arguments.py#L779

It’s recommended that FSDP should use larger values of `CUDA_DEVICE_MAX_CONNECTIONS` but Megatron TP requires it to be 1. Is it also the case for torch implementation of TP using DTensor? 

How should I configure the environment variable when using torch implementation of FSDP(2) and/or TP/CP/SP?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Question] FSDP+TP CUDA_DEVICE_MAX_CONNECTIONS #1147

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Question] FSDP+TP CUDA_DEVICE_MAX_CONNECTIONS #1147

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions