Skip to content

Conversation

hsuan-lun-chiang
Copy link
Collaborator

Description

Migrate Gpt-OSS implementation from Linen to NNX.

Tests

Ran train command to train gpt-oss for 20 steps:

python3 -m MaxText.train src/MaxText/configs/base.yml     base_output_directory=gs://maxtext-test/gpt-oss-train/     run_name=megablox_pre_training     model_name=gpt-oss-20b     tokenizer_type=huggingface     tokenizer_path=openai/gpt-oss-20b     dataset_type=synthetic     enable_checkpointing=true     attention=flash     sparse_matmul=True     megablox=True     dtype=bfloat16     weight_dtype=bfloat16     per_device_batch_size=4     steps=30     max_target_length=1024     ici_fsdp_parallelism=8

Logs:

Logs - Before Migration
Logs - After Migration

Checklist

Before submitting this PR, please make sure (put X in square brackets):

  • I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have run end-to-end tests tests and provided workload links above if applicable.
  • I have made or will make corresponding changes to the doc if needed.

@hsuan-lun-chiang hsuan-lun-chiang force-pushed the feat/Migrate-Gpt-OSS-to-NNX branch from cd60498 to 4192792 Compare October 2, 2025 07:01
@RissyRan
Copy link
Collaborator

RissyRan commented Oct 2, 2025

I see you added gemini-review flag, and it didn't work. We have a rule to check if this branch is forked version here

@hsuan-lun-chiang
Copy link
Collaborator Author

I see you added gemini-review flag, and it didn't work. We have a rule to check if this branch is forked version here

I wasn’t aware of that rule, thank you for pointing it out!

Copy link
Collaborator

@bvandermoon bvandermoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @hsuan-lun-chiang. Could you please run train (you already have this), decode, and then maxengine/jetstream (with profiles collected for maxengine/jetstream)? Similar to #2088

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants