-
Notifications
You must be signed in to change notification settings - Fork 21
Qwen35 sp fix by ulysses #127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
+1,753
−47
Closed
Changes from all commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
abf8497
support qwen35 sp
meichangsu1 b331436
fix qwen3.5 sp by using ulysses
meichangsu1 bc739b1
delete unused files
meichangsu1 1cf60ee
delete unusesd files
meichangsu1 22a71f5
fix bug
meichangsu1 75d006a
feat: standardize import formatting and fix attention implementation …
meichangsu1 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -45,14 +45,31 @@ def __init__(self, | |
| self.max_retries = kwargs.pop('max_retries', 20) | ||
| self.min_batch_size = min_batch_size | ||
| if device_mesh is not None: | ||
| assert batch_size >= device_mesh.data_world_size and batch_size % device_mesh.data_world_size == 0 | ||
| self.batch_size = batch_size | ||
| required_world_size = self._required_data_world_size(device_mesh) | ||
| assert batch_size >= required_world_size and batch_size % required_world_size == 0 | ||
| self.batch_size = self._resolve_runtime_batch_size(batch_size, device_mesh) | ||
| self.dataloader_params = kwargs | ||
| self.dataloader_params['batch_size'] = batch_size | ||
| self.dataloader_params['batch_size'] = self.batch_size | ||
| self.device_mesh = device_mesh | ||
| self.processor: Optional[InputProcessor] = None | ||
| self._set_work_init_fn() | ||
|
|
||
| @staticmethod | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. data_world_size 讲道理应该包含ulysses的判断才对
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. world_size的判断放在具体组件里不太合适,收敛到device_mesh中比较好 |
||
| def _required_data_world_size(device_mesh: Optional[DeviceMesh]) -> int: | ||
| if device_mesh is None: | ||
| return 1 | ||
| if (device_mesh.ulysses_size or 1) > 1: | ||
| return device_mesh.raw_data_world_size | ||
| return device_mesh.data_world_size | ||
|
|
||
| def _resolve_runtime_batch_size(self, batch_size: int, device_mesh: Optional[DeviceMesh]) -> int: | ||
| if device_mesh is None: | ||
| return batch_size | ||
| ulysses_size = device_mesh.ulysses_size or 1 | ||
| if ulysses_size <= 1: | ||
| return batch_size | ||
| return batch_size // ulysses_size | ||
|
|
||
| def _set_work_init_fn(self): | ||
| num_workers = self.dataloader_params.get('num_workers', 2) | ||
| self.dataloader_params['worker_init_fn'] = partial( | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,3 +1,32 @@ | ||
| # Copyright (c) ModelScope Contributors. All rights reserved. | ||
| from .multi_lora_transformers import MultiLoraTransformersModel | ||
| from .transformers import TransformersModel | ||
| from typing import TYPE_CHECKING | ||
|
|
||
| from twinkle.utils.import_utils import _LazyModule | ||
|
|
||
| if TYPE_CHECKING: | ||
| from .models import (TwinkleQwen3_5DecoderLayer, TwinkleQwen3_5ForCausalLM, TwinkleQwen3_5GatedDeltaNet, | ||
| TwinkleQwen3_5PreTrainedModel, TwinkleQwen3_5TextModel) | ||
| from .multi_lora_transformers import MultiLoraTransformersModel | ||
| from .transformers import TransformersModel | ||
| else: | ||
| _import_structure = { | ||
| 'transformers': ['TransformersModel'], | ||
| 'multi_lora_transformers': ['MultiLoraTransformersModel'], | ||
| 'models': [ | ||
| 'TwinkleQwen3_5PreTrainedModel', | ||
| 'TwinkleQwen3_5TextModel', | ||
| 'TwinkleQwen3_5DecoderLayer', | ||
| 'TwinkleQwen3_5GatedDeltaNet', | ||
| 'TwinkleQwen3_5ForCausalLM', | ||
| ], | ||
| } | ||
|
|
||
| import sys | ||
|
|
||
| sys.modules[__name__] = _LazyModule( | ||
| __name__, | ||
| globals()['__file__'], | ||
| _import_structure, | ||
| module_spec=__spec__, # noqa | ||
| extra_objects={}, | ||
| ) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,11 @@ | ||
| # Copyright (c) ModelScope Contributors. All rights reserved. | ||
| from .qwen3_5 import (TwinkleQwen3_5DecoderLayer, TwinkleQwen3_5ForCausalLM, TwinkleQwen3_5GatedDeltaNet, | ||
| TwinkleQwen3_5PreTrainedModel, TwinkleQwen3_5TextModel) | ||
|
|
||
| __all__ = [ | ||
| 'TwinkleQwen3_5PreTrainedModel', | ||
| 'TwinkleQwen3_5TextModel', | ||
| 'TwinkleQwen3_5DecoderLayer', | ||
| 'TwinkleQwen3_5GatedDeltaNet', | ||
| 'TwinkleQwen3_5ForCausalLM', | ||
| ] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,11 @@ | ||
| # Copyright (c) ModelScope Contributors. All rights reserved. | ||
| from .modeling_qwen3_5 import (TwinkleQwen3_5DecoderLayer, TwinkleQwen3_5ForCausalLM, TwinkleQwen3_5GatedDeltaNet, | ||
| TwinkleQwen3_5PreTrainedModel, TwinkleQwen3_5TextModel) | ||
|
|
||
| __all__ = [ | ||
| 'TwinkleQwen3_5PreTrainedModel', | ||
| 'TwinkleQwen3_5TextModel', | ||
| 'TwinkleQwen3_5DecoderLayer', | ||
| 'TwinkleQwen3_5GatedDeltaNet', | ||
| 'TwinkleQwen3_5ForCausalLM', | ||
| ] |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里的逻辑是什么原因呢