-
Notifications
You must be signed in to change notification settings - Fork 7k
box_area and box_iou functions for cxcywh format #8961
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hey @alperenunlu, thanks for your neat PR with the attached test case! This definitely provides a more straightforward way to compute area and IoU for cxcywhr format. At the same time, this also adds some code, which is slightly redundant with the functions for xyxy format. I'd like to understand the benefits of using these new functions over a two-step process: first convert the bounding box format and then compute IoU and areas. Could you please help me understand the trade-offs between these approaches? What are the advantages of having separate functions for cxcywhr format, and how do they outweigh the added complexity? I am trying to understand if the box conversion is actually the bottleneck in data pipelines involving these operations, and what could be the gain in having dedicated optimized functions to address it. Thanks in advance for your input! |
Hi @AntoineSimoulin, Thanks a lot for the thoughtful feedback! Just to clarify up front — this PR targets the standard cxcywh format (center-x, center-y, width, height), not cxcywhr with rotation. There’s no orientation handling here (though The motivation behind these functions comes from workflows where bounding boxes are already in Here are a few key advantages of providing native support:
I understand the concern around added code — I’ve tried to keep the implementation minimal, well-contained and tested, and I’m definitely open to suggestions. Thanks again — happy to discuss further! Best, |
Hey @alperenunlu, yeah sorry for the confusion, I meant the cxcywh format (center-x, center-y, width, height). I think all of this makes sense. Would it be possible for you to produce a small benchmark to illustrate the gains in term of performance, precision and type handling? It will be extremely useful to justify the decision to add the code. Let me know what is possible for you. Thanks a lot for your time and efforts! |
Hey @AntoineSimoulin, I've extensively profiled the code and included both the implementation and output below. Here's a summary of the results:
Under more realistic conditions (fewer boxes), the improvements are even more meaningful:
These benchmarks were run on a T4 GPU, and the results are consistent with my tests on an M1 MacBook (both CPU and MPS backends). I also ran a separate benchmark using
To ensure consistency, I ran 10 iterations across box counts ranging from 1 to 1001 (in steps of 5). The GPU performance gains remain consistent throughout. One thing to note: while the GPU speedups are stable, on CPU, the performance gain for the IoU function diminishes once comparisons exceed 100x100 boxes. At that point, the IoU computation becomes the bottleneck, and the speedup drops to around 1x. Feel free to test it further!
|
@alperenunlu Amazing! The benchmark looks good. Can you submit a PR with the changes from cf93d9e? I was thinking that instead of publicly exposing |
@AntoineSimoulin Thanks! What do you think? This is the PR: #8992 |
@NicolasHug Could you also take a look? |
🚀 The feature
Native box_area and box_iou functions for cxcywh format.
Motivation, pitch
Since the cxcywh format is common, we can use faster and simpler functions to calculate box area and box IoU directly in this format.
Currently, we first need to convert to the xyxy format before using box_area and box_iou in torchvision.
Alternatives
No response
Additional context
I can open a pull request from alperenunlu/vision@cf93d9e
The text was updated successfully, but these errors were encountered: