Skip to content

Latest commit

 

History

History
65 lines (42 loc) · 2.3 KB

File metadata and controls

65 lines (42 loc) · 2.3 KB

SimuMax Docs

This folder holds the public documentation for SimuMax.

Recommended first-time order:

  1. Run a shipped perf example in tutorial.md.
  2. Use the same tutorial to try the PerfLLM API directly.
  3. Copy the nearest existing model, strategy, and system JSONs before creating anything from scratch.
  4. If needed, search feasible batch settings or parallel strategies.
  5. Only enter the machine-measurement workflow when you need timing accuracy on a new machine.

Use this index by task:

1. Run perf with an existing config

2. Add your own machine/system config

Use the shipped config directly only when the target machine and dominant operator shapes are already covered. If the machine is new or system.miss_efficiency is non-empty for the target case, measure your own data before interpreting timing.

3. Add your own model config

4. Generate simulator trace and memory artifacts

Simulator artifacts are meant to help users understand stage-level timing, memory peaks, and cache lifetime, and to compare those modeled results against real traces or real memory evidence.

If you only need a smoke test or rough OOM estimate, you do not need to start with trace or snapshot generation.

5. Search batch settings or parallel strategies

Recommended order:

  1. search micro_batch_size / micro_batch_num first
  2. only then expand into a small tp/pp strategy sweep

Benchmarks

B200 Public Materials