Skip to content
This repository was archived by the owner on Mar 23, 2023. It is now read-only.
This repository was archived by the owner on Mar 23, 2023. It is now read-only.

[Compatibility] Runining OPT using PyTorch 1.12 and Gemini placement_policy = 'cuda' failed #166

@feifeibear

Description

@feifeibear

🐛 Describe the bug

Just run the examples/language/opt/run_clm.py will reproduce the error.
The program crashed with no error information.
After I replace placement_policy as 'cuda'. It is OK.

    placement_policy = 'cuda'
    chunk_manager = ChunkManager(chunk_size, process_group=pg,
                                 enable_distributed_storage=True,
                                 init_device=GeminiManager.get_default_device(placement_policy))
    gemini_manager = GeminiManager(placement_policy, chunk_manager)
    model = ZeroDDP(model, gemini_manager)
    logger.info(f'{model.__class__.__name__} has been created', ranks=[0])

Environment

colossalai 0.1.8+torch1.12cu11.3

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions