feat: benchmark `annbatch` in `scspeedtest` #1290

ilan-gold · 2025-10-28T16:51:16Z

Description

Great suite here!

I added an example for annbatch although I'm not exactly sure that's where it should live.

I also tinker with a couple of things, some of which are in the TODOs:

annbatch.ZarrSparseDataset is not based on torch.utils.data.DataLoader so num_workers doesn't really apply - the loader is threaded, but I think that is different (I added a comment about this). So I'm not sure how to handle that. That being said, there are cases we might want to use DataLoader (small block_size) but we wouldn't use all workers, only maybe a fourth of total available threads or something on this order of magnitude. I just let the benchmark does whatever. I guess we'll do a grid search anyway with different numbers of num_workers so it shouldn't matter
I think our data loader is faster when you have chunk_size=1 i.e., perfect randomness, when wrapped in torch.utils.data.DataLoader. But this isn't a requirement - what is interesting is that the create_collection function does a lot of unnecessary work in that case. It shuffles the data so in theory, we could special-case dataset creation for this by just writing the data to disk as zarr v3 anndata I noticed in the other scripts that SingleCellMemMapDataset takes in a pre-computed format, not creating one on disk. Should we do that as well? I am a little confused what the factory function does with dataset argument.
~~I have noticed you can't vstack a torch.Tensor. I wrote something to get around this using cupy but maybe y'all have some suggestions? It seems like a universal limitation but maybe not?~~ I'll just take this as a 2.0 problem for us and use cupy when needed. What I committed appears to work. It would be great if torch solved this. I also noticed you can't pin memory for a sparse tensor? That seems like it would also be good.
Speaking of cupy, I added it as a dep, but I'm not sure that makes sense. It gives ZarrSparseDataset a performance boost without relying on torch (i.e., so the loader could be used with jax) but also requires a GPU on the installing machine. Is this a safe assumption? Probably not, ~~but~~ I added ~~it anyway to make this clear and can change it~~ as an extra
Is there interest in "fake memory pressure?" This would allow us to "fake" big data by pre-allocating a buffer to take up some predefined percentage of memory. This would make the small dataset case itself a bit more informative but also give nice metrics like available_memory / dataset size ratio as another axis. The 25K dataset is probably too small for this to make sense since it is about 245MB in memory and I could see allocating amount_of_ram - 245MB bytes and then trying to do anything as flaky but we could do this on a ~1GB or ~2GB size dataset and be able to mock this "big data" behavior. I've in general had trouble reasoning about physical RAM available accurately using psutil so this might be something to hardcode i.e., allcoate N bytes because we know there are physically N bytes available. I would be interested in this because we support O_DIRECT reading as we've noticed thrashing on some linux machines when the page cache is full: https://annbatch.readthedocs.io/en/latest/zarr-configuration.html#zarrs-performance
Where to dump the data? I think I understood right that the on-disk dataset creator i.e., shared_dataset_factory should write to disk, but where to? I went with Path(input).parent but maybe that was not right? I couldn't quite be sure from the scdataset example since it include a pre-computed scdl file, no? I think this is a rehash of point 2. In general I'm not exactly clear how to create an on-disk dataset. A new script is needed? Are benchmarking how long it takes to make a dataset?

~~Marking as a draft since (a) annbatch is not 0.0.1 yet (so don't want to assume performance of anything quite yet since things might change a bit) and (b)~~ I am not sure what else if anything is needed (tests? I see it says that in the PR checklist but it's not applicable here, I think, although I'd be happy to add tests).

In general ready to go!

Usage

This should be in the example, according to the API

Type of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Refactor
Documentation update
Other (please describe): Benchmarking

CI Pipeline Configuration

Configure CI behavior by applying the relevant labels. By default, only basic unit tests are run.

ciflow:skip - Skip all CI tests for this PR
ciflow:notebooks - Run Jupyter notebooks execution tests for bionemo2
ciflow:slow - Run slow single GPU integration tests marked as @pytest.mark.slow for bionemo2
ciflow:all - Run all tests (unit tests, slow tests, and notebooks) for bionemo2. This label can be used to enforce running tests for all bionemo2.
ciflow:all-recipes - Run tests for all recipes (under bionemo-recipes). This label can be used to enforce running tests for all recipes.

Unit tests marked as @pytest.mark.multi_gpu or @pytest.mark.distributed are not run in the PR pipeline.

For more details, see CONTRIBUTING

Note

By default, only basic unit tests are run. Add appropriate labels to enable an additional test coverage.

Authorizing CI Runs

We use copy-pr-bot to manage authorization of CI
runs on NVIDIA's compute resources.

If a pull request is opened by a trusted user and contains only trusted changes, the pull request's code will
automatically be copied to a pull-request/ prefixed branch in the source repository (e.g. pull-request/123)
If a pull request is opened by an untrusted user or contains untrusted changes, an NVIDIA org member must leave an
/ok to test comment on the pull request to trigger CI. This will need to be done for each new commit.

Pre-submit Checklist

I have tested these changes locally
I have updated the documentation accordingly
I have added/updated tests as needed
All existing tests pass successfully

copy-pr-bot · 2025-10-28T16:51:19Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

ilan-gold · 2025-10-30T14:17:15Z

Ok we released 0.0.1 and I pinned it. I noticed that scDataset has released a 0.2.0 which breaks the tests here - maybe best to pin that as well while it's unstable?

Signed-off-by: ilan-gold <[email protected]>

### Description Adds the codonFM recipe Does not add top level readme edits. #### Usage  ```python TODO: Add code snippet ``` ### Type of changes  - [ ] Bug fix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Refactor - [ ] Documentation update - [ ] Other (please describe): ### CI Pipeline Configuration Configure CI behavior by applying the relevant labels. By default, only basic unit tests are run. - [ciflow:skip](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/main/contributing/contributing.md#ciflow:skip) - Skip all CI tests for this PR - [ciflow:notebooks](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/main/contributing/contributing.md#ciflow:notebooks) - Run Jupyter notebooks execution tests for bionemo2 - [ciflow:slow](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/main/contributing/contributing.md#ciflow:slow) - Run slow single GPU integration tests marked as @pytest.mark.slow for bionemo2 - [ciflow:all](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/main/contributing/contributing.md#ciflow:all) - Run all tests (unit tests, slow tests, and notebooks) for bionemo2. This label can be used to enforce running tests for all bionemo2. - [ciflow:all-recipes](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/main/contributing/contributing.md#ciflow:all-recipes) - Run tests for all recipes (under bionemo-recipes). This label can be used to enforce running tests for all recipes. Unit tests marked as `@pytest.mark.multi_gpu` or `@pytest.mark.distributed` are not run in the PR pipeline. For more details, see [CONTRIBUTING](CONTRIBUTING.md) > [!NOTE] > By default, only basic unit tests are run. Add appropriate labels to enable an additional test coverage. #### Authorizing CI Runs We use [copy-pr-bot](https://docs.gha-runners.nvidia.com/apps/copy-pr-bot/#automation) to manage authorization of CI runs on NVIDIA's compute resources. - If a pull request is opened by a trusted user and contains only trusted changes, the pull request's code will automatically be copied to a pull-request/ prefixed branch in the source repository (e.g. pull-request/123) - If a pull request is opened by an untrusted user or contains untrusted changes, an NVIDIA org member must leave an `/ok to test` comment on the pull request to trigger CI. This will need to be done for each new commit. ### Pre-submit Checklist  - [ ] I have tested these changes locally - [ ] I have updated the documentation accordingly - [ ] I have added/updated tests as needed - [ ] All existing tests pass successfully --------- Signed-off-by: Jonathan Mitchell <[email protected]> Co-authored-by: Yang Zhang <[email protected]> Co-authored-by: Cory Ye <[email protected]> Co-authored-by: Peter St. John <[email protected]> Co-authored-by: Timur Rvachov <[email protected]> Signed-off-by: ilan-gold <[email protected]>

### Description  #### Usage  ```python TODO: Add code snippet ``` ### Type of changes  - [ ] Bug fix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Refactor - [ ] Documentation update - [ ] Other (please describe): ### CI Pipeline Configuration Configure CI behavior by applying the relevant labels. By default, only basic unit tests are run. - [ciflow:skip](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/main/contributing/contributing.md#ciflow:skip) - Skip all CI tests for this PR - [ciflow:notebooks](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/main/contributing/contributing.md#ciflow:notebooks) - Run Jupyter notebooks execution tests for bionemo2 - [ciflow:slow](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/main/contributing/contributing.md#ciflow:slow) - Run slow single GPU integration tests marked as @pytest.mark.slow for bionemo2 - [ciflow:all](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/main/contributing/contributing.md#ciflow:all) - Run all tests (unit tests, slow tests, and notebooks) for bionemo2. This label can be used to enforce running tests for all bionemo2. - [ciflow:all-recipes](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/main/contributing/contributing.md#ciflow:all-recipes) - Run tests for all recipes (under bionemo-recipes). This label can be used to enforce running tests for all recipes. Unit tests marked as `@pytest.mark.multi_gpu` or `@pytest.mark.distributed` are not run in the PR pipeline. For more details, see [CONTRIBUTING](CONTRIBUTING.md) > [!NOTE] > By default, only basic unit tests are run. Add appropriate labels to enable an additional test coverage. #### Authorizing CI Runs We use [copy-pr-bot](https://docs.gha-runners.nvidia.com/apps/copy-pr-bot/#automation) to manage authorization of CI runs on NVIDIA's compute resources. - If a pull request is opened by a trusted user and contains only trusted changes, the pull request's code will automatically be copied to a pull-request/ prefixed branch in the source repository (e.g. pull-request/123) - If a pull request is opened by an untrusted user or contains untrusted changes, an NVIDIA org member must leave an `/ok to test` comment on the pull request to trigger CI. This will need to be done for each new commit. ### Pre-submit Checklist  - [ ] I have tested these changes locally - [ ] I have updated the documentation accordingly - [ ] I have added/updated tests as needed - [ ] All existing tests pass successfully Signed-off-by: Jonathan Mitchell <[email protected]> Signed-off-by: ilan-gold <[email protected]>

### Description root README changes to announce CodonFM ### Type of changes  - [ ] Bug fix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Refactor - [ ] Documentation update - [ ] Other (please describe): ### CI Pipeline Configuration Configure CI behavior by applying the relevant labels. By default, only basic unit tests are run. - [ciflow:skip](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/main/contributing/contributing.md#ciflow:skip) - Skip all CI tests for this PR - [ciflow:notebooks](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/main/contributing/contributing.md#ciflow:notebooks) - Run Jupyter notebooks execution tests for bionemo2 - [ciflow:slow](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/main/contributing/contributing.md#ciflow:slow) - Run slow single GPU integration tests marked as @pytest.mark.slow for bionemo2 - [ciflow:all](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/main/contributing/contributing.md#ciflow:all) - Run all tests (unit tests, slow tests, and notebooks) for bionemo2. This label can be used to enforce running tests for all bionemo2. - [ciflow:all-recipes](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/main/contributing/contributing.md#ciflow:all-recipes) - Run tests for all recipes (under bionemo-recipes). This label can be used to enforce running tests for all recipes. Unit tests marked as `@pytest.mark.multi_gpu` or `@pytest.mark.distributed` are not run in the PR pipeline. For more details, see [CONTRIBUTING](CONTRIBUTING.md) > [!NOTE] > By default, only basic unit tests are run. Add appropriate labels to enable an additional test coverage. #### Authorizing CI Runs We use [copy-pr-bot](https://docs.gha-runners.nvidia.com/apps/copy-pr-bot/#automation) to manage authorization of CI runs on NVIDIA's compute resources. - If a pull request is opened by a trusted user and contains only trusted changes, the pull request's code will automatically be copied to a pull-request/ prefixed branch in the source repository (e.g. pull-request/123) - If a pull request is opened by an untrusted user or contains untrusted changes, an NVIDIA org member must leave an `/ok to test` comment on the pull request to trigger CI. This will need to be done for each new commit. ### Pre-submit Checklist  - [ ] I have tested these changes locally - [ ] I have updated the documentation accordingly - [ ] I have added/updated tests as needed - [ ] All existing tests pass successfully Signed-off-by: Timur Rvachov <[email protected]> Signed-off-by: ilan-gold <[email protected]>

) ### Description  #### Usage  ```python TODO: Add code snippet ``` ### Type of changes  - [ ] Bug fix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Refactor - [ ] Documentation update - [ ] Other (please describe): ### CI Pipeline Configuration Configure CI behavior by applying the relevant labels. By default, only basic unit tests are run. - [ciflow:skip](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/main/contributing/contributing.md#ciflow:skip) - Skip all CI tests for this PR - [ciflow:notebooks](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/main/contributing/contributing.md#ciflow:notebooks) - Run Jupyter notebooks execution tests for bionemo2 - [ciflow:slow](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/main/contributing/contributing.md#ciflow:slow) - Run slow single GPU integration tests marked as @pytest.mark.slow for bionemo2 - [ciflow:all](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/main/contributing/contributing.md#ciflow:all) - Run all tests (unit tests, slow tests, and notebooks) for bionemo2. This label can be used to enforce running tests for all bionemo2. - [ciflow:all-recipes](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/main/contributing/contributing.md#ciflow:all-recipes) - Run tests for all recipes (under bionemo-recipes). This label can be used to enforce running tests for all recipes. Unit tests marked as `@pytest.mark.multi_gpu` or `@pytest.mark.distributed` are not run in the PR pipeline. For more details, see [CONTRIBUTING](CONTRIBUTING.md) > [!NOTE] > By default, only basic unit tests are run. Add appropriate labels to enable an additional test coverage. #### Authorizing CI Runs We use [copy-pr-bot](https://docs.gha-runners.nvidia.com/apps/copy-pr-bot/#automation) to manage authorization of CI runs on NVIDIA's compute resources. - If a pull request is opened by a trusted user and contains only trusted changes, the pull request's code will automatically be copied to a pull-request/ prefixed branch in the source repository (e.g. pull-request/123) - If a pull request is opened by an untrusted user or contains untrusted changes, an NVIDIA org member must leave an `/ok to test` comment on the pull request to trigger CI. This will need to be done for each new commit. ### Pre-submit Checklist  - [ ] I have tested these changes locally - [ ] I have updated the documentation accordingly - [ ] I have added/updated tests as needed - [ ] All existing tests pass successfully --------- Signed-off-by: Jonathan Mitchell <[email protected]> Signed-off-by: ilan-gold <[email protected]>

### Description rebase on https://github.com/NVIDIA-Digital-Bio/CodonFM/blob/main/notebooks/3-Zero-Shot-Mutation-Variant-Clinvar-Synonymous.ipynb #### Usage  ```python TODO: Add code snippet ``` ### Type of changes  - [x] Bug fix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Refactor - [x] Documentation update - [ ] Other (please describe): ### CI Pipeline Configuration Configure CI behavior by applying the relevant labels. By default, only basic unit tests are run. - [ciflow:skip](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/main/contributing/contributing.md#ciflow:skip) - Skip all CI tests for this PR - [ciflow:notebooks](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/main/contributing/contributing.md#ciflow:notebooks) - Run Jupyter notebooks execution tests for bionemo2 - [ciflow:slow](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/main/contributing/contributing.md#ciflow:slow) - Run slow single GPU integration tests marked as @pytest.mark.slow for bionemo2 - [ciflow:all](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/main/contributing/contributing.md#ciflow:all) - Run all tests (unit tests, slow tests, and notebooks) for bionemo2. This label can be used to enforce running tests for all bionemo2. - [ciflow:all-recipes](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/main/contributing/contributing.md#ciflow:all-recipes) - Run tests for all recipes (under bionemo-recipes). This label can be used to enforce running tests for all recipes. Unit tests marked as `@pytest.mark.multi_gpu` or `@pytest.mark.distributed` are not run in the PR pipeline. For more details, see [CONTRIBUTING](CONTRIBUTING.md) > [!NOTE] > By default, only basic unit tests are run. Add appropriate labels to enable an additional test coverage. #### Authorizing CI Runs We use [copy-pr-bot](https://docs.gha-runners.nvidia.com/apps/copy-pr-bot/#automation) to manage authorization of CI runs on NVIDIA's compute resources. - If a pull request is opened by a trusted user and contains only trusted changes, the pull request's code will automatically be copied to a pull-request/ prefixed branch in the source repository (e.g. pull-request/123) - If a pull request is opened by an untrusted user or contains untrusted changes, an NVIDIA org member must leave an `/ok to test` comment on the pull request to trigger CI. This will need to be done for each new commit. ### Pre-submit Checklist  - [x] I have tested these changes locally - [ ] I have updated the documentation accordingly - [ ] I have added/updated tests as needed - [ ] All existing tests pass successfully Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: ilan-gold <[email protected]>

### Description  #### Usage  ```python TODO: Add code snippet ``` ### Type of changes  - [ ] Bug fix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Refactor - [ ] Documentation update - [ ] Other (please describe): ### CI Pipeline Configuration Configure CI behavior by applying the relevant labels. By default, only basic unit tests are run. - [ciflow:skip](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/main/contributing/contributing.md#ciflow:skip) - Skip all CI tests for this PR - [ciflow:notebooks](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/main/contributing/contributing.md#ciflow:notebooks) - Run Jupyter notebooks execution tests for bionemo2 - [ciflow:slow](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/main/contributing/contributing.md#ciflow:slow) - Run slow single GPU integration tests marked as @pytest.mark.slow for bionemo2 - [ciflow:all](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/main/contributing/contributing.md#ciflow:all) - Run all tests (unit tests, slow tests, and notebooks) for bionemo2. This label can be used to enforce running tests for all bionemo2. - [ciflow:all-recipes](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/main/contributing/contributing.md#ciflow:all-recipes) - Run tests for all recipes (under bionemo-recipes). This label can be used to enforce running tests for all recipes. Unit tests marked as `@pytest.mark.multi_gpu` or `@pytest.mark.distributed` are not run in the PR pipeline. For more details, see [CONTRIBUTING](CONTRIBUTING.md) > [!NOTE] > By default, only basic unit tests are run. Add appropriate labels to enable an additional test coverage. #### Authorizing CI Runs We use [copy-pr-bot](https://docs.gha-runners.nvidia.com/apps/copy-pr-bot/#automation) to manage authorization of CI runs on NVIDIA's compute resources. - If a pull request is opened by a trusted user and contains only trusted changes, the pull request's code will automatically be copied to a pull-request/ prefixed branch in the source repository (e.g. pull-request/123) - If a pull request is opened by an untrusted user or contains untrusted changes, an NVIDIA org member must leave an `/ok to test` comment on the pull request to trigger CI. This will need to be done for each new commit. ### Pre-submit Checklist  - [ ] I have tested these changes locally - [ ] I have updated the documentation accordingly - [ ] I have added/updated tests as needed - [ ] All existing tests pass successfully Signed-off-by: Jonathan Mitchell <[email protected]> Signed-off-by: ilan-gold <[email protected]>

Reduce batch size from `64` to `32` to fix oom issue. https://dashboard.dgxc-lepton.nvidia.com/workspace/vfco61g2/compute/jobs/list?search=esm2-native-650m-fsdp2-thd&creators=-&status=-&labels=- --------- Signed-off-by: Jared Wilber <[email protected]> Signed-off-by: ilan-gold <[email protected]>

…ks (NVIDIA#1283) ### Description * Adding accuracy analysis table to README.md of evo2 submodule. * Adding `evo2/40b-1m-fp8-bf16:1.0` resource to `load` and `download_bionemo_data` #### Usage On the CLI: ```bash CKPT_PATH=$(download_bionemo_data evo2/40b-1m-fp8-bf16:1.0) ``` In code: ```python from bionemo.core.data.load import load ckpt_path = load("evo2/40b-1m-fp8-bf16:1.0") ``` #### Verifiction: 1. Manually and temporarily replace `nvidia` with `nvstaging` in the ngc path in the evo2.yaml since the link is not yet public: ```yaml - tag: 40b-1m-fp8-bf16:1.0 ngc: nvstaging/clara/evo2-40b-1m-fp8-bf16-nemo2:1.0 ``` 2. run the download command and see if it's successful (checks most of the URL, as well as MD5sums etc): ```bash CKPT_PATH=$(download_bionemo_data evo2/40b-1m-fp8-bf16:1.0) ``` Returns: ```bash Downloading data from 'nvstaging/clara/evo2-40b-1m-fp8-bf16-nemo2:1.0' to file '/home/ubuntu/.cache/bionemo/544b47e033d1fb0261b686a53f7c4fe240cd290253187d31e8c99dea9e35a680-evo2_40b_bf16_finetune_wandb_Ji2IRcrz_step_119.tar.gz'. { "download_end": "2025-10-27 23:00:34", "download_start": "2025-10-27 22:40:22", "download_time": "20m 12s", "files_downloaded": 1, "local_path": "/home/ubuntu/.cache/bionemo/tmp9tdgbowq/evo2-40b-1m-fp8-bf16-nemo2_v1.0", "size_downloaded": "59.31 GB", "status": "COMPLETED" } Untarring contents of '/home/ubuntu/.cache/bionemo/544b47e033d1fb0261b686a53f7c4fe240cd290253187d31e8c99dea9e35a680-evo2_40b_bf16_finetune_wandb_Ji2IRcrz_step_119.tar.gz' to '/home/ubuntu/.cache/bionemo/544b47e033d1fb0261b686a53f7c4fe240cd290253187d31e8c99dea9e35a680-evo2_40b_bf16_finetune_wandb_Ji2IRcrz_step_119.tar.gz.untar' ``` ### Type of changes - [ ] Bug fix (non-breaking change which fixes an issue) - [x] New feature (non-breaking change which adds functionality) - [ ] Refactor - [x] Documentation update - [ ] Other (please describe): ### CI Pipeline Configuration Configure CI behavior by applying the relevant labels. By default, only basic unit tests are run. - [ciflow:skip](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/main/contributing/contributing.md#ciflow:skip) - Skip all CI tests for this PR - [ciflow:notebooks](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/main/contributing/contributing.md#ciflow:notebooks) - Run Jupyter notebooks execution tests for bionemo2 - [ciflow:slow](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/main/contributing/contributing.md#ciflow:slow) - Run slow single GPU integration tests marked as @pytest.mark.slow for bionemo2 - [ciflow:all](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/main/contributing/contributing.md#ciflow:all) - Run all tests (unit tests, slow tests, and notebooks) for bionemo2. This label can be used to enforce running tests for all bionemo2. - [ciflow:all-recipes](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/main/contributing/contributing.md#ciflow:all-recipes) - Run tests for all recipes (under bionemo-recipes). This label can be used to enforce running tests for all recipes. Unit tests marked as `@pytest.mark.multi_gpu` or `@pytest.mark.distributed` are not run in the PR pipeline. For more details, see [CONTRIBUTING](CONTRIBUTING.md) > [!NOTE] > By default, only basic unit tests are run. Add appropriate labels to enable an additional test coverage. #### Authorizing CI Runs We use [copy-pr-bot](https://docs.gha-runners.nvidia.com/apps/copy-pr-bot/#automation) to manage authorization of CI runs on NVIDIA's compute resources. - If a pull request is opened by a trusted user and contains only trusted changes, the pull request's code will automatically be copied to a pull-request/ prefixed branch in the source repository (e.g. pull-request/123) - If a pull request is opened by an untrusted user or contains untrusted changes, an NVIDIA org member must leave an `/ok to test` comment on the pull request to trigger CI. This will need to be done for each new commit. ### Pre-submit Checklist  - [ ] I have tested these changes locally - [ ] I have updated the documentation accordingly - [ ] I have added/updated tests as needed - [ ] All existing tests pass successfully --------- Signed-off-by: John St John <[email protected]> Signed-off-by: ilan-gold <[email protected]>

### Description  #### Usage  ```python TODO: Add code snippet ``` ### Type of changes  - [ ] Bug fix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Refactor - [x] Documentation update - [ ] Other (please describe): ### CI Pipeline Configuration Configure CI behavior by applying the relevant labels. By default, only basic unit tests are run. - [ciflow:skip](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/main/contributing/contributing.md#ciflow:skip) - Skip all CI tests for this PR - [ciflow:notebooks](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/main/contributing/contributing.md#ciflow:notebooks) - Run Jupyter notebooks execution tests for bionemo2 - [ciflow:slow](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/main/contributing/contributing.md#ciflow:slow) - Run slow single GPU integration tests marked as @pytest.mark.slow for bionemo2 - [ciflow:all](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/main/contributing/contributing.md#ciflow:all) - Run all tests (unit tests, slow tests, and notebooks) for bionemo2. This label can be used to enforce running tests for all bionemo2. - [ciflow:all-recipes](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/main/contributing/contributing.md#ciflow:all-recipes) - Run tests for all recipes (under bionemo-recipes). This label can be used to enforce running tests for all recipes. Unit tests marked as `@pytest.mark.multi_gpu` or `@pytest.mark.distributed` are not run in the PR pipeline. For more details, see [CONTRIBUTING](CONTRIBUTING.md) > [!NOTE] > By default, only basic unit tests are run. Add appropriate labels to enable an additional test coverage. #### Authorizing CI Runs We use [copy-pr-bot](https://docs.gha-runners.nvidia.com/apps/copy-pr-bot/#automation) to manage authorization of CI runs on NVIDIA's compute resources. - If a pull request is opened by a trusted user and contains only trusted changes, the pull request's code will automatically be copied to a pull-request/ prefixed branch in the source repository (e.g. pull-request/123) - If a pull request is opened by an untrusted user or contains untrusted changes, an NVIDIA org member must leave an `/ok to test` comment on the pull request to trigger CI. This will need to be done for each new commit. ### Pre-submit Checklist  - [ ] I have tested these changes locally - [ ] I have updated the documentation accordingly - [ ] I have added/updated tests as needed - [ ] All existing tests pass successfully Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: ilan-gold <[email protected]>

…adme (NVIDIA#1298) ### Description Inside tasks.py the following line exists for creating folders. ``` if not os.path.exists(out_dir): os.makedirs(out_dir) ``` However, if you have a multi-node system running, this may happen. ``` Process 0 checks os.path.exists(out_dir) → returns False Process 1 checks os.path.exists(out_dir) → returns False Process 0 calls os.makedirs(out_dir) → succeeds Process 1 calls os.makedirs(out_dir) → fails with FileExistsError ``` Thus, the solution here is to use os.makedirs(out_dir, exist_ok=True) #### Usage  ```python TODO: Add code snippet ``` ### Type of changes  - [ ] Bug fix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Refactor - [ ] Documentation update - [ ] Other (please describe): ### CI Pipeline Configuration Configure CI behavior by applying the relevant labels. By default, only basic unit tests are run. - [ciflow:skip](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/main/contributing/contributing.md#ciflow:skip) - Skip all CI tests for this PR - [ciflow:notebooks](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/main/contributing/contributing.md#ciflow:notebooks) - Run Jupyter notebooks execution tests for bionemo2 - [ciflow:slow](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/main/contributing/contributing.md#ciflow:slow) - Run slow single GPU integration tests marked as @pytest.mark.slow for bionemo2 - [ciflow:all](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/main/contributing/contributing.md#ciflow:all) - Run all tests (unit tests, slow tests, and notebooks) for bionemo2. This label can be used to enforce running tests for all bionemo2. - [ciflow:all-recipes](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/main/contributing/contributing.md#ciflow:all-recipes) - Run tests for all recipes (under bionemo-recipes). This label can be used to enforce running tests for all recipes. Unit tests marked as `@pytest.mark.multi_gpu` or `@pytest.mark.distributed` are not run in the PR pipeline. For more details, see [CONTRIBUTING](CONTRIBUTING.md) > [!NOTE] > By default, only basic unit tests are run. Add appropriate labels to enable an additional test coverage. #### Authorizing CI Runs We use [copy-pr-bot](https://docs.gha-runners.nvidia.com/apps/copy-pr-bot/#automation) to manage authorization of CI runs on NVIDIA's compute resources. - If a pull request is opened by a trusted user and contains only trusted changes, the pull request's code will automatically be copied to a pull-request/ prefixed branch in the source repository (e.g. pull-request/123) - If a pull request is opened by an untrusted user or contains untrusted changes, an NVIDIA org member must leave an `/ok to test` comment on the pull request to trigger CI. This will need to be done for each new commit. ### Pre-submit Checklist  - [ ] I have tested these changes locally - [ ] I have updated the documentation accordingly - [ ] I have added/updated tests as needed - [ ] All existing tests pass successfully --------- Signed-off-by: Jonathan Mitchell <[email protected]> Signed-off-by: ilan-gold <[email protected]>

…aults Signed-off-by: ilan-gold <[email protected]>

Signed-off-by: ilan-gold <[email protected]>

polinabinder1

Point 1. That’s a good catch
Point 2. There is the option of converting it each time vs. just using a created dataset. So for the creation of a SCDL dataset, we have a create_scdl_dataset_and_loader_factory. But we could also split it up into create_scdl_from_anndata (this would create a scdl dataset on disk from adnndata) and create_scdl_dataloader_factory (which would create a dataloader from that dataset)
Points 3. /4 The issue here is that we don’t benchmark on GPU currently. It doesn’t look like ZarrSparseDataset explicitly depends on there should be a fallback option that works on CPU.

Point 5. That’s a good idea - I have experimented with adding RAM pressure + running benchmarking. It could be interesting to allow for that pressure within the framework.

Could you share some of the benchmarking results?

I also want to sanity check that the results look reasonable to you. I really appreciate this PR.

polinabinder1 · 2025-11-03T20:16:08Z

/ok to test 90d44de

ilan-gold · 2025-11-04T14:11:06Z

There is the option of converting it each time vs. just using a created dataset. So for the creation of a SCDL dataset, we have a create_scdl_dataset_and_loader_factory. But we could also split it up into create_scdl_from_anndata (this would create a scdl dataset on disk from adnndata) and create_scdl_dataloader_factory (which would create a dataloader from that dataset)

I looked into this and was a little confused - it looks like there is no "conversion" in the other examples, but rather a scdl-path passed in and then it benchmarks the time needed to instantiate the class but not do conversion i.e., dataset = SingleCellMemMapDataset(data_path) does no actual on-disk writing, no? So I separated that but maybe I'm wrong.

The issue here is that we don’t benchmark on GPU currently. It doesn’t look like ZarrSparseDataset explicitly depends on there should be a fallback option that works on CPU.

Ok, and my colleague just noticed we assume you have cupy installed by default (i..e, the default settings in ZarrSparseDataset use it). So we'll fix that but it's explicitly set to False no. Ideally, when run on a GPU, we would have it installed.

Point 5. That’s a good idea - I have experimented with adding RAM pressure + running benchmarking. It could be interesting to allow for that pressure within the framework.

Yea, this is the only way for me to develop since I don't have the time/disk space on my linux machine to get the full dataset, but I'm going to change that soon!

Here are the results - I added cache pressure because, like I said, I've only really looked at it with cache pressure. As you can see, the dataset is 18GB on disk, so about ~100GB uncompressed and I only gave myself 10GB of free RAM. I didn't turn on direct_io, but I also don't think it's such a big deal with such a small chunk size (in fact, probably harmful) - my hunch is that it only gets to be useful once the block_size goes really big like 512 (which is probably not a good idea unless your data is shuffled on-disk 😉). Also, very interested in the num_workers grid search. So grid searching over both of those would be super cool i.e., use_direct_io on and off, and then `num_workers.

Here are the results, which broadly make sense to me:
annbatch_benchmark_20251104_121014_detailed_breakdown.csv

Signed-off-by: Ilan Gold <[email protected]>

polinabinder1 · 2025-11-04T19:20:14Z

Here's an example of data conversion: https://github.com/NVIDIA/bionemo-framework/blob/pbinder/benchmark_conversion_example/sub-packages/bionemo-scspeedtest/examples/scdl_conversion_example.py

Could you add the specs of your machine/ share a comparison to SCDL? Also, could you share the dataset that this is on? We have seen a lot of variability in our benchmarking work based on the machine, so I would be excited to play with this.

Signed-off-by: Ilan Gold <[email protected]>

ilan-gold · 2025-11-05T11:35:58Z

lscpu

gives

Architecture:                x86_64
  CPU op-mode(s):            32-bit, 64-bit
  Address sizes:             48 bits physical, 48 bits virtual
  Byte Order:                Little Endian
CPU(s):                      16
  On-line CPU(s) list:       0-15
Vendor ID:                   AuthenticAMD
  Model name:                AMD EPYC-Rome Processor
    CPU family:              23
    Model:                   49
    Thread(s) per core:      1
    Core(s) per socket:      1
    Socket(s):               16
    Stepping:                0
    BogoMIPS:                5988.74
    Flags:                   fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm rep_good nopl
                              cpuid extd_apicid tsc_known_freq pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lah
                             f_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw topoext perfctr_core ssbd ibrs ibpb stibp vmmcall fsgsbase tsc_adjust bmi1 avx2 smep bmi
                             2 rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 clzero xsaveerptr wbnoinvd arat umip rdpid arch_capabilities
Virtualization features:     
  Hypervisor vendor:         KVM
  Virtualization type:       full
Caches (sum of all):         
  L1d:                       512 KiB (16 instances)
  L1i:                       512 KiB (16 instances)
  L2:                        8 MiB (16 instances)
  L3:                        256 MiB (16 instances)
NUMA:                        
  NUMA node(s):              1
  NUMA node0 CPU(s):         0-15
Vulnerabilities:             
  Gather data sampling:      Not affected
  Indirect target selection: Not affected
  Itlb multihit:             Not affected
  L1tf:                      Not affected
  Mds:                       Not affected
  Meltdown:                  Not affected
  Mmio stale data:           Not affected
  Reg file data sampling:    Not affected
  Retbleed:                  Mitigation; untrained return thunk; SMT disabled
  Spec rstack overflow:      Mitigation; SMT disabled
  Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl and seccomp
  Spectre v1:                Mitigation; usercopy/swapgs barriers and __user pointer sanitization
  Spectre v2:                Mitigation; Retpolines; IBPB conditional; STIBP disabled; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected
  Srbds:                     Not affected
  Tsa:                       Not affected
  Tsx async abort:           Not affected

I don't know about the SSD this is run on. I could enquire, but it's just a provisioned machine from https://cloud.denbi.de/wiki/ so shouldn't be anything crazy. That being said, it appears faster than the average sagemaker instance. So I definitely understand you about hardware differences, but we haven't seen any relative change in performance i.e., one method is 10X faster than another on machine, but only 5X on another. Maybe that could change, would be interested to find out!

The data is a 6 million cell subset of tahoe. I could try generating an scdl dataset, but I don't have a script. If you were to move that script into main I would be happy to run it.

I added a measure-collection-creation-time option to the CLI instead of making a separate script. Hope that's ok. I get what seem to be reasonable numbers for dataset creation time when it is enabled.

ilan-gold marked this pull request as ready for review October 30, 2025 14:18

ilan-gold requested review from broland-hat, cspades, dorotat-nv, jomitchellnv, jstjohn, jwilber, polinabinder1, pstjohn, skothenhill-nv, trvachov and yzhang123 as code owners October 30, 2025 14:18

ilan-gold and others added 13 commits October 31, 2025 17:28

feat: benchmark annbatch

856ad39

Signed-off-by: ilan-gold <[email protected]>

fix: update for 0.0.1

001fc21

Signed-off-by: ilan-gold <[email protected]>

fix: no sense in being opinionated about num_workers + sensible def…

115288b

…aults Signed-off-by: ilan-gold <[email protected]>

ilan-gold force-pushed the annbatch_benchmark branch from b4ccef7 to 115288b Compare October 31, 2025 16:30

ilan-gold requested review from DejunL and farhadrgh as code owners October 31, 2025 16:30

ilan-gold requested a review from tshimko-nv as a code owner October 31, 2025 16:30

ilan-gold added 2 commits October 31, 2025 17:32

Merge branch 'main' into annbatch_benchmark

205fa2e

Signed-off-by: ilan-gold <[email protected]>

Merge branch 'main' into annbatch_benchmark

87c8d6d

polinabinder1 reviewed Nov 3, 2025

View reviewed changes

Merge branch 'main' into annbatch_benchmark

90d44de

ilan-gold force-pushed the annbatch_benchmark branch from 011690d to aaafa21 Compare November 4, 2025 14:17

ilan-gold added 2 commits November 4, 2025 14:17

fix: make work with creating new dataset + direct_io configurable

aaafa21

Signed-off-by: Ilan Gold <[email protected]>

Merge branch 'main' into annbatch_benchmark

81ffc1b

ilan-gold added 2 commits November 5, 2025 11:14

Merge branch 'main' into annbatch_benchmark

2725c12

feat: add ability to measure collection creation time

d1eb1a2

Signed-off-by: Ilan Gold <[email protected]>

ilan-gold force-pushed the annbatch_benchmark branch from 2f58ed3 to d1eb1a2 Compare November 5, 2025 11:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: benchmark `annbatch` in `scspeedtest` #1290

feat: benchmark `annbatch` in `scspeedtest` #1290

Uh oh!

ilan-gold commented Oct 28, 2025 •

edited

Loading

Uh oh!

copy-pr-bot bot commented Oct 28, 2025

Uh oh!

ilan-gold commented Oct 30, 2025

Uh oh!

polinabinder1 left a comment •

edited

Loading

Uh oh!

polinabinder1 commented Nov 3, 2025

Uh oh!

ilan-gold commented Nov 4, 2025

Uh oh!

polinabinder1 commented Nov 4, 2025

Uh oh!

ilan-gold commented Nov 5, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

feat: benchmark annbatch in scspeedtest #1290

Are you sure you want to change the base?

feat: benchmark annbatch in scspeedtest #1290

Uh oh!

Conversation

ilan-gold commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Usage

Type of changes

CI Pipeline Configuration

Authorizing CI Runs

Pre-submit Checklist

Uh oh!

copy-pr-bot bot commented Oct 28, 2025

Uh oh!

ilan-gold commented Oct 30, 2025

Uh oh!

polinabinder1 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

polinabinder1 commented Nov 3, 2025

Uh oh!

ilan-gold commented Nov 4, 2025

Uh oh!

polinabinder1 commented Nov 4, 2025

Uh oh!

ilan-gold commented Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

feat: benchmark `annbatch` in `scspeedtest` #1290

feat: benchmark `annbatch` in `scspeedtest` #1290

ilan-gold commented Oct 28, 2025 •

edited

Loading

polinabinder1 left a comment •

edited

Loading

ilan-gold commented Nov 5, 2025 •

edited

Loading