feat(cache): add HDF5 baseline inference cache layer (#567) by AdityaX18 · Pull Request #568 · JdeRobot/PerceptionMetrics

AdityaX18 · 2026-04-26T09:30:43Z

Summary

Fixes #567

Adds perceptionmetrics/utils/cache.py — a standalone HDF5-backed cache layer
for baseline inference outputs, implementing Layer 1 of the two-layer cached
architecture discussed in #567.

Problem

The current eval loop re-runs model inference for every perturbation condition.
For N images × P perturbation types × I intensities, this produces N·P·I
forward passes on the clean baseline — all redundant after the first run.

Example: COCO val2017 (5,000 images), 5 perturbation types, 5 intensities →
125,000 forward passes. With this cache: 5,000 baseline passes once, reused
across all conditions.

What this PR adds

perceptionmetrics/utils/cache.py:

CacheWriter — context manager; writes preprocessed image tensors (C, H, W)
and detection predictions (bboxes, labels, scores) to HDF5 after the clean
baseline eval loop.
CacheReader — validates model_hash + schema_version on open; provides
lazy per-image tensor and prediction access.
is_cache_valid(path, model_hash) → bool — O(1) guard for the eval loop
short-circuit.
compute_model_hash(model, file_path) → str — SHA-256 of checkpoint file,
or numel-proxy fallback.

tests/test_cache.py: round-trip, stale-hash, is_cache_valid,
zero-detection, metadata, and image_ids tests (6 passing).

pyproject.toml: adds h5py = ">=3.10,<4".

HDF5 schema

cache.hdf5
├── metadata/ (model_name, coco_split, model_hash, timestamp, schema_version)
├── tensors/{img_id} float32 (C, H, W), chunks=(C,H,W)
└── preds/{img_id}/
├── bboxes float32 (N_det, 4)
├── labels int64 (N_det,)
└── scores float32 (N_det,)

Zero-detection images write empty (0,4) / (0,) datasets — consumers need
no branch on group existence.

Out of scope (follow-up PR)

Integration into torch_detection.py's eval() — the read/short-circuit
path that uses is_cache_valid before inference.

Testing

poetry run pytest tests/test_cache.py -v

All 6 tests pass.

Add perceptionmetrics/utils/cache.py with CacheWriter, CacheReader, and is_cache_valid. Serialises preprocessed image tensors and detection predictions to HDF5 after one clean baseline eval run. Downstream perturbation conditions reuse the cache, eliminating N*P*I redundant forward passes on clean baseline. This is Layer 1 (disk cache write/read) only. Integration into torch_detection.py eval() is a follow-up PR. - CacheWriter: context manager, writes tensors + preds per image - CacheReader: validates model_hash + schema_version on open, lazy access - is_cache_valid: O(1) guard for eval loop short-circuit - Zero-detection images write empty (0,4)/(0,)/(0,) datasets Tests: round-trip, stale-hash, is_cache_valid, zero-det, metadata, image_ids. Adds h5py>=3.10,<4 dependency. Closes JdeRobot#567

AdityaX18 · 2026-04-26T09:48:28Z

@dpascualhe, I've finished the HDF5 cache implementation (Layer 1) as we discussed in #567. Ready for review when you have a moment!

AdityaX18 added 2 commits April 21, 2026 01:31

feat: add computational cost tab to GUI evaluator

a1ea192

AdityaX18 marked this pull request as draft April 26, 2026 09:31

Merge branch 'master' into issue_#567_hdf5_cache_layer

ecc4fe5

AdityaX18 marked this pull request as ready for review April 26, 2026 09:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(cache): add HDF5 baseline inference cache layer (#567)#568

feat(cache): add HDF5 baseline inference cache layer (#567)#568
AdityaX18 wants to merge 3 commits intoJdeRobot:masterfrom
AdityaX18:issue_#567_hdf5_cache_layer

AdityaX18 commented Apr 26, 2026

Uh oh!

AdityaX18 commented Apr 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

AdityaX18 commented Apr 26, 2026

Summary

Problem

What this PR adds

HDF5 schema

Out of scope (follow-up PR)

Testing

Uh oh!

AdityaX18 commented Apr 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant