Skip to content

[Question] FSDP+TP CUDA_DEVICE_MAX_CONNECTIONS #1147

Open
@ChenchaoZhao

Description

@ChenchaoZhao

In Megatron repo https://github.com/NVIDIA/Megatron-LM/blob/4429e8ebe21fb011529d7401c370841ce530785a/megatron/training/arguments.py#L779

It’s recommended that FSDP should use larger values of CUDA_DEVICE_MAX_CONNECTIONS but Megatron TP requires it to be 1. Is it also the case for torch implementation of TP using DTensor?

How should I configure the environment variable when using torch implementation of FSDP(2) and/or TP/CP/SP?

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to documentationmodule: fsdpquestionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions