Skip to content

Conversation

@mtennenhaus
Copy link

What?

Add storage read/write support to sequential KV bench (matrix indices ≥ world_size map to storage), with per-TP files and NIXL FILE transfers for rank→storage and storage→rank I/O.

Why?

  • Benchmark realistic tiered I/O (offload/load KV) alongside rank↔rank transfers.
  • Measure storage bandwidth/latency and interaction with GPU/CPU paths.
  • Ensure deterministic, isolated storage regions per TP for reproducible results.

How?

  • Storage endpoint is used as a base directory.
  • CLI requires a base storage path; each TP uses base/tp_/obj_<storage_idx>.bin.
  • TP matrix row/col that are bigger than world size are storage endpoints (files for read/write).
    example with world size 1:
    (0 0)
    (100m 0)
    Row 1 = storage file 0. 100mb will be read from the file (created in init phase) to rank 0 via selected backend.
  • Rank 0 prep: delete/recreate TP dir and files for read if needed.
  • Transfers:
    • Rank↔rank via existing UCX.
    • Storage via POSIX / GDS.

@copy-pr-bot
Copy link

copy-pr-bot bot commented Nov 17, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions
Copy link

👋 Hi mtennenhaus! Thank you for contributing to ai-dynamo/nixl.

Your PR reviewers will review your contribution then trigger the CI to test your changes.

🚀

@brminich
Copy link
Contributor

/ok to test 944f1e6

@brminich
Copy link
Contributor

/build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants