Skip to content

feat(cache): add HDF5 baseline inference cache layer (#567)#568

Open
AdityaX18 wants to merge 3 commits intoJdeRobot:masterfrom
AdityaX18:issue_#567_hdf5_cache_layer
Open

feat(cache): add HDF5 baseline inference cache layer (#567)#568
AdityaX18 wants to merge 3 commits intoJdeRobot:masterfrom
AdityaX18:issue_#567_hdf5_cache_layer

Conversation

@AdityaX18
Copy link
Copy Markdown

Summary

Fixes #567

Adds perceptionmetrics/utils/cache.py — a standalone HDF5-backed cache layer
for baseline inference outputs, implementing Layer 1 of the two-layer cached
architecture discussed in #567.

Problem

The current eval loop re-runs model inference for every perturbation condition.
For N images × P perturbation types × I intensities, this produces N·P·I
forward passes on the clean baseline — all redundant after the first run.

Example: COCO val2017 (5,000 images), 5 perturbation types, 5 intensities →
125,000 forward passes. With this cache: 5,000 baseline passes once, reused
across all conditions.

What this PR adds

perceptionmetrics/utils/cache.py:

  • CacheWriter — context manager; writes preprocessed image tensors (C, H, W)
    and detection predictions (bboxes, labels, scores) to HDF5 after the clean
    baseline eval loop.
  • CacheReader — validates model_hash + schema_version on open; provides
    lazy per-image tensor and prediction access.
  • is_cache_valid(path, model_hash) → bool — O(1) guard for the eval loop
    short-circuit.
  • compute_model_hash(model, file_path) → str — SHA-256 of checkpoint file,
    or numel-proxy fallback.

tests/test_cache.py: round-trip, stale-hash, is_cache_valid,
zero-detection, metadata, and image_ids tests (6 passing).

pyproject.toml: adds h5py = ">=3.10,<4".

HDF5 schema

cache.hdf5
├── metadata/ (model_name, coco_split, model_hash, timestamp, schema_version)
├── tensors/{img_id} float32 (C, H, W), chunks=(C,H,W)
└── preds/{img_id}/
├── bboxes float32 (N_det, 4)
├── labels int64 (N_det,)
└── scores float32 (N_det,)

Zero-detection images write empty (0,4) / (0,) datasets — consumers need
no branch on group existence.

Out of scope (follow-up PR)

Integration into torch_detection.py's eval() — the read/short-circuit
path that uses is_cache_valid before inference.

Testing

poetry run pytest tests/test_cache.py -v

All 6 tests pass.

Add perceptionmetrics/utils/cache.py with CacheWriter, CacheReader,
and is_cache_valid. Serialises preprocessed image tensors and detection
predictions to HDF5 after one clean baseline eval run. Downstream
perturbation conditions reuse the cache, eliminating N*P*I redundant
forward passes on clean baseline.

This is Layer 1 (disk cache write/read) only. Integration into
torch_detection.py eval() is a follow-up PR.

- CacheWriter: context manager, writes tensors + preds per image
- CacheReader: validates model_hash + schema_version on open, lazy access
- is_cache_valid: O(1) guard for eval loop short-circuit
- Zero-detection images write empty (0,4)/(0,)/(0,) datasets

Tests: round-trip, stale-hash, is_cache_valid, zero-det, metadata, image_ids.
Adds h5py>=3.10,<4 dependency.

Closes JdeRobot#567
@AdityaX18 AdityaX18 marked this pull request as draft April 26, 2026 09:31
@AdityaX18 AdityaX18 marked this pull request as ready for review April 26, 2026 09:33
@AdityaX18
Copy link
Copy Markdown
Author

@dpascualhe, I've finished the HDF5 cache implementation (Layer 1) as we discussed in #567. Ready for review when you have a moment!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: Add HDF5 baseline inference cache to eliminate redundant forward passes in perturbation evaluation

1 participant