Skip to content

Add polygon dataset class#1359

Open
Om-Doiphode wants to merge 9 commits intoweecology:mainfrom
Om-Doiphode:polygon_dataset
Open

Add polygon dataset class#1359
Om-Doiphode wants to merge 9 commits intoweecology:mainfrom
Om-Doiphode:polygon_dataset

Conversation

@Om-Doiphode
Copy link
Copy Markdown
Contributor

@Om-Doiphode Om-Doiphode commented Mar 24, 2026

Description

This PR adds a polygon dataset loader class that can be used for training segmentation models like Mask RCNN.

  1. Create a polygon dataset class.
  2. Write tests for polygon dataset loader
  3. Create an evaluation function for polygon predictions

Related Issue(s)

Fixes issue #758

AI-Assisted Development

  • I used AI tools (e.g., GitHub Copilot, ChatGPT, etc.) in developing this PR
  • I understand all the code I'm submitting
  • I have reviewed and validated all AI-generated code

AI tools used (if applicable):

ChatGPT

@Om-Doiphode Om-Doiphode mentioned this pull request Mar 24, 2026
4 tasks
@jveitchmichaelis
Copy link
Copy Markdown
Collaborator

jveitchmichaelis commented Mar 25, 2026

Could you scope this pr to only the dataset/loader please? You could have a look at the keypoint commit (recently) for reference.

We can test that we can evaluate predictions (would need to add some to the repo) without integrating the training and inference loop.

We should also think about:

  • What format labels do we support loading? I think we probably should support MS-COCO at this point, or at least add a conversion script to generate a suitable CSV with the correct geometry.
  • Output format (get_item)? Typically this is boolean mask-per-object or it may need to be type long for torchvision.
  • Similarly, format convention for models in HuggingFace? See MaskFormer, etc.

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 30, 2026

Codecov Report

❌ Patch coverage is 83.92857% with 27 lines in your changes missing coverage. Please review.
✅ Project coverage is 86.61%. Comparing base (884502e) to head (ceff4f3).
⚠️ Report is 9 commits behind head on main.

Files with missing lines Patch % Lines
src/deepforest/datasets/training.py 81.06% 25 Missing ⚠️
src/deepforest/utilities.py 94.44% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1359      +/-   ##
==========================================
- Coverage   87.35%   86.61%   -0.74%     
==========================================
  Files          24       24              
  Lines        2981     3332     +351     
==========================================
+ Hits         2604     2886     +282     
- Misses        377      446      +69     
Flag Coverage Δ
unittests 86.61% <83.92%> (-0.74%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Om-Doiphode
Copy link
Copy Markdown
Contributor Author

Hi @jveitchmichaelis,

Do you expect me to evaluate the mask predictions using multi-threshold AP (Average Precision) and AR (Average Recall)?

@jveitchmichaelis
Copy link
Copy Markdown
Collaborator

jveitchmichaelis commented Mar 30, 2026

I would use whatever MeanAveragePrecision provides for now (in segm mode).

https://lightning.ai/docs/torchmetrics/stable/detection/mean_average_precision.html

You might also want to pick another image from OAM-TCD that has only tree labels, and not canopy, as this is a group annotation specific to that dataset and we will probably aim to train a tree-only model here. So, not 5d1cc58493e1130005fc0eb0_2921.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants