The comm_gemm_overlap example needs clearer documentation on:
- How to enable GEMM+communication overlap - The process (initialize userbuffers, set layer flags, cleanup) should be explicitly documented with minimal code snippets
- Layer-specific parameters - Clarify the difference between ub_tp_comm_overlap vs individual flags
- Hugging Face integration - Provide guidance on combining with HF model replacement patterns
Current state:
examples/pytorch/comm_gemm_overlap/README.md covers requirements and run commands but lacks setup guidance