This folder holds the public documentation for SimuMax.
Recommended first-time order:
- Run a shipped
perfexample in tutorial.md. - Use the same tutorial to try the
PerfLLMAPI directly. - Copy the nearest existing
model,strategy, andsystemJSONs before creating anything from scratch. - If needed, search feasible batch settings or parallel strategies.
- Only enter the machine-measurement workflow when you need timing accuracy on a new machine.
Use this index by task:
Use the shipped config directly only when the target machine and dominant operator shapes are already covered. If the machine is new or system.miss_efficiency is non-empty for the target case, measure your own data before interpreting timing.
Simulator artifacts are meant to help users understand stage-level timing, memory peaks, and cache lifetime, and to compare those modeled results against real traces or real memory evidence.
If you only need a smoke test or rough OOM estimate, you do not need to start with trace or snapshot generation.
Recommended order:
- search
micro_batch_size/micro_batch_numfirst - only then expand into a small
tp/ppstrategy sweep
- Release notes: release_v1.2.md
- Public benchmark summary: FULL_RESULTS.md