fix: store cuda_available variable and extend perf_counter fix to all get_computational_cost functions by vinitjain2005 · Pull Request #553 · JdeRobot/PerceptionMetrics

vinitjain2005 · 2026-04-18T04:00:14Z

Summary

This PR extends the timing fix originally proposed in the closed PR to all get_computational_cost() functions across the library, and also addresses the review comment to store torch.cuda.is_available() in a variable instead of calling it on every loop iteration.

Closes #453

Problem

The get_computational_cost() functions across multiple files had two issues:

torch.cuda.synchronize() was called unconditionally on every iteration — it is a no-op on CPU and was designed only for CUDA devices
time.time() was used for timing — it has low resolution on Windows (15ms granularity) causing timing to show 0ms or jump in 15ms chunks
torch.cuda.is_available() was being called on every loop iteration instead of being stored once as a variable

Fix

# Before
for _ in range(runs):
    torch.cuda.synchronize()        # unconditional, no-op on CPU
    start = time.time()             # low resolution on Windows
    ...
    torch.cuda.synchronize()
    inference_times.append(time.time() - start)

# After
cuda_available = torch.cuda.is_available()  # stored once
for _ in range(runs):
    if cuda_available:
        torch.cuda.synchronize()
    start = time.perf_counter()     # high resolution on all platforms
    ...
    if cuda_available:
        torch.cuda.synchronize()
    inference_times.append(time.perf_counter() - start)

Files Changed

perceptionmetrics/models/torch_detection.py
- Fixed standalone get_computational_cost() function
perceptionmetrics/models/torch_segmentation.py
- Fixed TorchImageSegmentationModel.get_computational_cost()
- Fixed TorchLiDARSegmentationModel.get_computational_cost()
perceptionmetrics/models/tf_segmentation.py
- Fixed TensorflowImageSegmentationModel.get_computational_cost()
- Note: Uses has_gpu variable (already stored before loop)
  and replaces time.time() with time.perf_counter()

Impact

Every user running computational cost estimation on CPU gets accurate timing results. This is especially relevant since CUDA is optional and most contributors and new users run on CPU-only machines.

References

PyTorch docs: torch.cuda.synchronize() only synchronizes CUDA device operations
Python docs: time.perf_counter() is the recommended high-resolution timer for benchmarking

Github: @vinitjain2005

fix: use model device for CUDA sync and extend perf_counter timing fix

965fb8d

vinitjain2005 mentioned this pull request Apr 18, 2026

fix: store cuda_available variable and extend perf_counter fix to all get_computational_cost functions #467

Closed

vinitjain2005 force-pushed the fix/cuda-sync-device-check branch 3 times, most recently from 1a7c937 to 15c272d Compare April 18, 2026 04:28

fix: use model device for CUDA sync and extend perf_counter timing fix

1bc2e4d

vinitjain2005 force-pushed the fix/cuda-sync-device-check branch from 15c272d to 1bc2e4d Compare April 18, 2026 04:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: store cuda_available variable and extend perf_counter fix to all get_computational_cost functions#553

fix: store cuda_available variable and extend perf_counter fix to all get_computational_cost functions#553
vinitjain2005 wants to merge 2 commits into
JdeRobot:masterfrom
vinitjain2005:fix/cuda-sync-device-check

vinitjain2005 commented Apr 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

vinitjain2005 commented Apr 18, 2026

Summary

Problem

Fix

Files Changed

Impact

References

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant