You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source_en/Instruction/Megatron-SWIFT-Training.md
+4-3
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,8 @@
1
1
2
2
# Megatron-SWIFT Training
3
3
4
+
SWIFT incorporates Megatron's parallelization techniques to accelerate the training of large models, including data parallelism, tensor parallelism, pipeline parallelism, sequence parallelism, and context parallelism. For models that support Megatron training, please refer to the [Supported Models and Datasets documentation](./Supported-models-and-datasets.md).
5
+
4
6
## Environment Setup
5
7
6
8
To use Megatron-SWIFT, in addition to installing the `swift` dependencies, you also need to install the following:
The dependency library Megatron-LM will be git cloned and installed by swift, no manual installation by the user is required. You can also use the environment variable `MEGATRON_LM_PATH` to point to the already downloaded repo path (for offline environments).
21
+
The dependency library Megatron-LM will be git cloned and installed by swift, no manual installation by the user is required. You can also use the environment variable `MEGATRON_LM_PATH` to point to the already downloaded repo path (for offline environments, use the [core_r0.11.0 branch](https://github.com/NVIDIA/Megatron-LM/tree/core_r0.11.0)).
20
22
21
23
22
24
## Quick Start Example
@@ -99,7 +101,7 @@ I am a language model developed by swift, you can call me swift-robot. How can I
99
101
```
100
102
101
103
- More cases can be viewed [here](https://github.com/modelscope/ms-swift/tree/main/examples/train/megatron).
102
-
104
+
- For pretraining, you can use `megatron pt` instead of `megatron sft`, which will use a generative template for training.
103
105
104
106
## Command Line Arguments
105
107
@@ -215,7 +217,6 @@ I am a language model developed by swift, you can call me swift-robot. How can I
215
217
- position_embedding_type: Type of positional embedding, options are 'learned_absolute', 'rope', 'relative', and 'none'. Default is 'rope'.
216
218
- rotary_base: Default is 10000.
217
219
- rotary_percent: Default is 1.
218
-
- rotary_seq_len_interpolation_factor: Sequence length interpolation factor, default is None.
219
220
- normalization: Options are 'LayerNorm', 'RMSNorm'. Default is RMSNorm.
220
221
- norm_epsilon: Default is 1e-5.
221
222
- swiglu: Uses swiglu instead of the default gelu. Default is True.
0 commit comments