The following is the changelog for the MindSpore Transformers suite version 1.5.0, with the following key new features and bugfixes compared to version 1.3.2.
- Distributed Parallelism: Added Seq Pipe feature, Hybrid Sequence Parallelization feature.
- Weights: Added support for Safetensors format weights, which supports Safetensors remove-redundancy saving.
- Datasets: For Hugging Face datasets added support for Packing; For Megatron multi-source mixed datasets added support for EOD mask compression.
- Training Monitor: Added support for TensorBoard real-time visualized monitoring of training metrics.
- High Availability: Added end-of-life CKPT function, UCE fault tolerance recovery function and process-level rescheduling recovery function.
- Heterogeneous Storage: Added SWAP function for fine-grained activation values during training.
The following new models are supported:
| Models | Specifications |
|---|---|
| DeepSeek-V3/R1 | DeepSeek-V3-671B (pre-training, fine-tuning, inference), DeepSeek-R1-671B (inference) |
| Llama3.2 | Llama3.2-3B (inference), Llama3.2-Vision-11B (fine-tuning, inference) |
| Qwen2.5 | Qwen2.5-0.5B/1.5B (inference) /7B/14B/32B/72B (fine-tuning, inference) |
| TeleChat2 | TeleChat2-7B/35B/115 (fine-tuning, inference) |
| YiZhao | YiZhao-12B (pre-training, fine-tuning) |
During the current release cycle, we have bugfixed many aspects of the model/functionality/usability/documentation. Here is a list of some of the key fixes:
- !6013: Fixed incompatibility between context parallelism (cp) and sequence parallelism (use_seq_parallel).
- !6007: Fixed that setting the maximum number of checkpoints to keep during training (keep_checkpoint_max) does not take effect on keeping checkpoints for pure model parameters.
- !83880: Fix overflow detection failure when large cluster gradient overflows.
- !80845, !80861: Fix an issue where Llama models report an error when enabling ConstantWarmUpLR with compilation cache turned on.
In the current version, some historical deprecated models/codes/documentations have been changed. Details of the changes are as follows:
| Change Content | Change Description |
|---|---|
| Downgraded code, configuration files and materials of deprecated models | The models involved include Bloom, BaiChuan, BaiChuan2, CodeGeeX, CodeGeeX2, GLM, GLM2, VisualGLM, InternLM, PanguAlpha, SAM, SkyWork, WizardCoder, Qwen, Ziya, Llama |
| Downgraded code for deprecated interfaces | The involved interfaces include CompareLoss, FusedCastAdamWeightDecay, MultiImgCapDataLoader, MultiImgCapDataset, ImageToTextRetrievalTrainer, auto_augment, group_ic_params, group_mim_parameters, TokenClassificationTrainer |
| Downgraded the old version of the official documentation | Downgraded the old version of the documentation related files in the repository. Subsequent official documentation is available at MindSpore Transformers Official Documentation |
Thanks to the following people for their contributions:
chengxianbin, Chong Li, ehaleva, hangangqiang, huangshengshuai, huangzhuo, leida, lilei, limengyuan, liubuyu, lizhihao, moran, wangpingan, wangshaocong, wudawei, wutiancheng, wuweikang, yangminghai, yao_yf, zhanzhan, ZhouJingfeng, zhouyaqiang, 常少中, 陈心锐, 陈昱坤, 程泽睿志, 樊瑞, 范益, 封霆谚, 冯浩, 葛煜洪, 郭儒辰, 何泽泉, 胡安东, 胡思超, 胡志坤, 宦晓玲, 黄靖伟, 黄磊, 黄新元, 黄勇, 黄志超, 黄子灵, 季文尚, 金仁操, 孔紫怡, 蓝翔, 李嘉坤, 李俊标, 李子垠, 林盈来, 刘晨晖, 刘烙彬, 刘力力, 刘言伟, 马成贵, 倪钰鑫, 牛君豪, 彭竞由, 秦思莼, 任峪瑾, 赛尧, 苏海波, 孙宇轩, 谭纬城, 唐德志, 汪家傲, 王浩然, 王振邦, 魏琢艺, 吴昊天, 吴治锋, 吴致远, 肖尧, 尤日帆, 俞涵, 张丹阳, 张浩, 张敏利, 张森镇, 张奕晖, 张又文, 赵奕舜, 周声煦, 周小琪, 祝建伟, 邹文祥
Contributions to the project in any form are welcome!