Skip to content

fix: store cuda_available variable and extend perf_counter fix to all get_computational_cost functions#553

Open
vinitjain2005 wants to merge 2 commits into
JdeRobot:masterfrom
vinitjain2005:fix/cuda-sync-device-check
Open

fix: store cuda_available variable and extend perf_counter fix to all get_computational_cost functions#553
vinitjain2005 wants to merge 2 commits into
JdeRobot:masterfrom
vinitjain2005:fix/cuda-sync-device-check

Conversation

@vinitjain2005
Copy link
Copy Markdown
Contributor

Summary

This PR extends the timing fix originally proposed in the closed PR to all get_computational_cost() functions across the library, and also addresses the review comment to store torch.cuda.is_available() in a variable instead of calling it on every loop iteration.

Closes #453

Problem

The get_computational_cost() functions across multiple files had two issues:

  1. torch.cuda.synchronize() was called unconditionally on every iteration — it is a no-op on CPU and was designed only for CUDA devices

  2. time.time() was used for timing — it has low resolution on Windows (15ms granularity) causing timing to show 0ms or jump in 15ms chunks

  3. torch.cuda.is_available() was being called on every loop iteration instead of being stored once as a variable

Fix

# Before
for _ in range(runs):
    torch.cuda.synchronize()        # unconditional, no-op on CPU
    start = time.time()             # low resolution on Windows
    ...
    torch.cuda.synchronize()
    inference_times.append(time.time() - start)

# After
cuda_available = torch.cuda.is_available()  # stored once
for _ in range(runs):
    if cuda_available:
        torch.cuda.synchronize()
    start = time.perf_counter()     # high resolution on all platforms
    ...
    if cuda_available:
        torch.cuda.synchronize()
    inference_times.append(time.perf_counter() - start)

Files Changed

  • perceptionmetrics/models/torch_detection.py

    • Fixed standalone get_computational_cost() function
  • perceptionmetrics/models/torch_segmentation.py

    • Fixed TorchImageSegmentationModel.get_computational_cost()
    • Fixed TorchLiDARSegmentationModel.get_computational_cost()
  • perceptionmetrics/models/tf_segmentation.py

    • Fixed TensorflowImageSegmentationModel.get_computational_cost()
    • Note: Uses has_gpu variable (already stored before loop)
      and replaces time.time() with time.perf_counter()

Impact

Every user running computational cost estimation on CPU gets accurate timing results. This is especially relevant since CUDA is optional and most contributors and new users run on CPU-only machines.

References

  • PyTorch docs: torch.cuda.synchronize() only synchronizes CUDA device operations
  • Python docs: time.perf_counter() is the recommended high-resolution timer for benchmarking

Github: @vinitjain2005

@vinitjain2005 vinitjain2005 force-pushed the fix/cuda-sync-device-check branch from 15c272d to 1bc2e4d Compare April 18, 2026 04:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] get_computational_cost() uses torch.cuda.synchronize() unconditionally causing inaccurate CPU timing

1 participant