Add seq parallelism for attention and MoE MLP #1328

suexu1025 · 2025-03-01T01:17:06Z

Description

Add seq_parallelism + exp_parallelism for attention + followed MLP module
with sp+ep, moe customer 2k seq inference improved by 20%
Fix prefill_KV_cache sharding mismatch during seq_parallelism
decode improved by 10%
Enable inference auto layout in mistral model

FIXES: b/374773995

Tests

tested on v6e/v5p:
SEQ=2048

python MaxText/inference_microbenchmark.py MaxText/configs/inference.yml max_prefill_predict_length=$SEQ max_target_length=6144 model_name=mixtral-8x7b ici_fsdp_parallelism=1 ici_autoregressive_parallelism=1 ici_expert_parallelism=1 ici_context_parallelism=4 ici_tensor_parallelism=1 scan_layers=false per_device_batch_size=1 attention=dot_product megablox=False quantization=int8 checkpoint_is_quantized=True quantize_kvcache=True capacity_factor=1 tokenizer_path=assets/tokenizer.mistral-v3 compute_axis_order=0,2,1,3 ar_cache_axis_order=0,2,1,3 enable_jax_profiler=True inference_microbenchmark_prefill_lengths="$SEQ" base_output_directory=$OUT_DIR run_name=$RUN_NAME profiler=xplane model_call_mode=inference inference_microbenchmark_stages=prefill

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed.

google-cla · 2025-03-01T01:17:13Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

RissyRan

Thanks Qinwen! Would be great if you could help run a few steps of training tests with profiles. Previously we met issues in optimization of inference, and some changes have degragation of training performance.

MaxText/configs/base.yml

MaxText/layers/linears.py

MaxText/configs/inference.yml

MaxText/layers/linears.py

suexu1025

address comments

MaxText/configs/base.yml

MaxText/layers/linears.py

…generate

MaxText/layers/linears.py

-- 7b8f711 by ZhiyuLi-goog <[email protected]>: [exp] seq exp sharding -- 802313a by ZhiyuLi-goog <[email protected]>: update -- d8d4595 by ZhiyuLi-goog <[email protected]>: update -- b7f3225 by Qinwen Xu <[email protected]>: merge for sp -- 7ed5fd1 by Qinwen Xu <[email protected]>: fix merge parts -- a1c6973 by Qinwen Xu <[email protected]>: update merge confict base config -- 3f2d278 by Qinwen Xu <[email protected]>: update to fix sharding mismatch -- 3e06ebb by Qinwen Xu <[email protected]>: update sub_seq for masks -- d23d27b by Qinwen Xu <[email protected]>: update sharding axis -- 924ce77 by Qinwen Xu <[email protected]>: update with reshape -- b62812d by Qinwen Xu <[email protected]>: solve merge conflict -- 746f4a3 by Qinwen Xu <[email protected]>: update for generate sharding -- a6d345c by Qinwen Xu <[email protected]>: enable compute_axis configurable in mixtral model -- e06c3d6 by Qinwen Xu <[email protected]>: address output_logits sharding -- 65a64d4 by Qinwen Xu <[email protected]>: clean up -- 10a9d82 by Qinwen Xu <[email protected]>: update -- 0cca6df by Qinwen Xu <[email protected]>: update -- ebae8e0 by Qinwen Xu <[email protected]>: fix tests -- 2e0c459 by Qinwen Xu <[email protected]>: added contition for non-sharded kernel for cp during inference only -- 37c843e by Qinwen Xu <[email protected]>: update -- b63c63b by Qinwen Xu <[email protected]>: bug fix -- 4007e7c by Qinwen Xu <[email protected]>: fix tests -- 72f2a90 by Qinwen Xu <[email protected]>: adddress comment -- 8da48f5 by Qinwen Xu <[email protected]>: update -- 8a43dd5 by Qinwen Xu <[email protected]>: address comments -- 56deeda by Qinwen Xu <[email protected]>: address comments -- 1c6be59 by Qinwen Xu <[email protected]>: revert -- bd0e199 by Qinwen Xu <[email protected]>: address lint -- 44d646f by Qinwen Xu <[email protected]>: reformat for lint -- 5172068 by Qinwen Xu <[email protected]>: update MOE test -- d6787c3 by Qinwen Xu <[email protected]>: add comment to explain grouping in generate_mask for moe model -- f964acd by Qinwen Xu <[email protected]>: address the comments -- c5174de by Qinwen Xu <[email protected]>: update to fix tests -- b86e035 by Qinwen Xu <[email protected]>: seperate yml for inference -- e96340e by Qinwen Xu <[email protected]>: update to address training perf difference -- 7446563 by Qinwen Xu <[email protected]>: update -- 3b5346f by Qinwen Xu <[email protected]>: revert back mask_shape for tests -- b7dcb1e by Qinwen Xu <[email protected]>: added back reshape and clean up merge changes -- c859b4c by Qinwen Xu <[email protected]>: address comment to remove reshape -- 7d91629 by Qinwen Xu <[email protected]>: update with different softmaxt score for inference/training for mask_generate -- 1a0fdb3 by Qinwen Xu <[email protected]>: lint COPYBARA_INTEGRATE_REVIEW=#1328 from AI-Hypercomputer:qinwen/sharding_merge_main 4d94d95 PiperOrigin-RevId: 735916344

ZhiyuLi-goog and others added 11 commits February 27, 2025 02:51

[exp] seq exp sharding

7b8f711

update

802313a

update

d8d4595

merge for sp

b7f3225

fix merge parts

7ed5fd1

update merge confict base config

a1c6973

update to fix sharding mismatch

3f2d278

update sub_seq for masks

3e06ebb

update sharding axis

d23d27b

update with reshape

924ce77

solve merge conflict

b62812d

suexu1025 requested review from mailvijayasingh and vipannalla March 1, 2025 01:17

suexu1025 requested review from gobbleturk, khatwanimohit, bvandermoon, RissyRan, richjames0, rni418 and gagika as code owners March 1, 2025 01:17

suexu1025 added 6 commits March 3, 2025 19:05

update for generate sharding

746f4a3

enable compute_axis configurable in mixtral model

a6d345c

address output_logits sharding

e06c3d6

clean up

65a64d4

Merge branch 'main' into qinwen/sharding_merge_main

23cd85f

update

10a9d82

suexu1025 changed the title ~~[Draft] Add seq parallelism for attention and MLP~~ Add seq parallelism for attention and MoE MLP Mar 6, 2025

suexu1025 added 2 commits March 6, 2025 00:38

update

0cca6df

Merge branch 'main' into qinwen/sharding_merge_main

cd005f3

suexu1025 requested a review from RissyRan March 7, 2025 22:59

RissyRan reviewed Mar 7, 2025

View reviewed changes

MaxText/configs/base.yml Outdated Show resolved Hide resolved

MaxText/layers/linears.py Outdated Show resolved Hide resolved

MaxText/layers/linears.py Outdated Show resolved Hide resolved

MaxText/layers/linears.py Outdated Show resolved Hide resolved

suexu1025 added 9 commits March 8, 2025 01:09

address the comments

f964acd

Merge branch 'main' into qinwen/sharding_merge_main

930d77b

update to fix tests

c5174de

Merge branch 'main' into qinwen/sharding_merge_main

5c3fe75

seperate yml for inference

b86e035

Merge branch 'main' into qinwen/sharding_merge_main

cf8f0ec

update to address training perf difference

e96340e

update

7446563

revert back mask_shape for tests

3b5346f

suexu1025 requested a review from RissyRan March 11, 2025 02:11

suexu1025 added 2 commits March 10, 2025 19:11

Merge branch 'main' into qinwen/sharding_merge_main

c7ec0a4

Merge branch 'main' into qinwen/sharding_merge_main

e3d56c0

RissyRan reviewed Mar 11, 2025

View reviewed changes

suexu1025 added 2 commits March 11, 2025 17:18

added back reshape and clean up merge changes

b7dcb1e

address comment to remove reshape

c859b4c

suexu1025 commented Mar 11, 2025

View reviewed changes

update with different softmaxt score for inference/training for mask_…

7d91629

…generate

suexu1025 commented Mar 11, 2025

View reviewed changes

MaxText/layers/linears.py Outdated Show resolved Hide resolved

lint

1a0fdb3

RissyRan approved these changes Mar 11, 2025

View reviewed changes

Merge branch 'main' into qinwen/sharding_merge_main

4d94d95

mailvijayasingh approved these changes Mar 11, 2025

View reviewed changes

suexu1025 added the pull ready label Mar 11, 2025

suexu1025 added 2 commits March 11, 2025 15:50

Merge branch 'main' into qinwen/sharding_merge_main

9a60574

Merge branch 'main' into qinwen/sharding_merge_main

9a12bd2

suexu1025 closed this Mar 12, 2025

liurupeng removed the pull ready label Mar 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add seq parallelism for attention and MoE MLP #1328

Add seq parallelism for attention and MoE MLP #1328

suexu1025 commented Mar 1, 2025 •

edited

Loading

google-cla bot commented Mar 1, 2025

RissyRan left a comment

suexu1025 left a comment

Add seq parallelism for attention and MoE MLP #1328

Add seq parallelism for attention and MoE MLP #1328

Conversation

suexu1025 commented Mar 1, 2025 • edited Loading

Description

Tests

Checklist

google-cla bot commented Mar 1, 2025

RissyRan left a comment

Choose a reason for hiding this comment

suexu1025 left a comment

Choose a reason for hiding this comment

suexu1025 commented Mar 1, 2025 •

edited

Loading