Skip to content

Add 11L XSA11 + BigramHash3072 + AdamW Legal TTT submission#841

Open
someone114514 wants to merge 4 commits intoopenai:mainfrom
someone114514:xsa11-bigram3072-adamw-legal-ttt
Open

Add 11L XSA11 + BigramHash3072 + AdamW Legal TTT submission#841
someone114514 wants to merge 4 commits intoopenai:mainfrom
someone114514:xsa11-bigram3072-adamw-legal-ttt

Conversation

@someone114514
Copy link
Copy Markdown

Summary

Adds a new 16MB submission folder with:

  • 11-layer 512d transformer
  • XSA on all 11 layers
  • BigramHash with 3072 buckets and dim 112
  • Parameter Banking + Parallel Muon
  • score-first legal TTT with AdamW

Included Files

  • README.md
  • submission.json
  • train_seed1337.log
  • train_gpt.py

Result

  • legal_ttt_exact val_bpb: 1.11565196
  • artifact size: 15,983,339 bytes

Notes

  • sliding window stride: 64
  • TTT chunk size: 131072
  • TTT epochs: 3
  • TTT freeze blocks: 8

FlashyFlash3011 added a commit to FlashyFlash3011/parameter-golf that referenced this pull request Mar 26, 2026
Combines NewTest (PR openai#841 base) with SOTA experiments that achieved ~1.12 BPB:
- train_seq_len/eval_seq_len: 2048 → 4096 (long context from user's SOTA exps)
- bigram_vocab_size: 3072 → 2048, bigram_dim: 112 → 128 (proven SOTA settings)
- xsa_last_n: 11 → 4 (from user's best experiments)
- gated_attention + value_residual: enabled by default (PR openai#824/838 show ~0.018 BPB improvement)
- Bank QAT: symmetric int6 STE fake-quant on all weight banks during warmdown
- Fix: CastedLinear QAT clip range (-32,31) → (-31,31) to match export format
- Compression: lzma-6 → zstd-22 (PR openai#824/838: 14.9MB vs ~16MB, critical for fitting under limit)
- Fix: target_mb budget uses decimal MB (1e6) not MiB (1024^2) matching competition rules
- Budget-aware ±1 weight pruning retained from NewTest
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant