Add Fuzzy Versions, to compare bytes <> socrata #2061

alexrichey · 2025-11-13T21:35:10Z

This modifies a few things:

Adds fuzzy version comparisons in the BYTES <> Socrata script so that we can compare version formats like '2025-01-01' with '2025Q1', or "Oct 2025" with "2025-10-01" to see what's potentially out of date. E.g.

Modifies the output dataframe to make results more actionable. Namely, it sorts:
1. Products with ANY out of date datasets, where there exists a Socrata version
2. Products with ANY out of date datasets, but no Socrata version (these may be out of date... or they may just be TODO on the Socrata side, and so have no version)
3. The rest

This PR is mostly slopped. I took a shot at integrating this with our existing dcpy.utils.versions code, but came away thinking that will be a bigger endeavor (and probably refactor of that module).

codecov · 2025-11-13T21:42:11Z

Codecov Report

❌ Patch coverage is 53.03030% with 31 lines in your changes missing coverage. Please review.
✅ Project coverage is 75.65%. Comparing base (751229e) to head (65884dc).
⚠️ Report is 14 commits behind head on main.

Files with missing lines	Patch %	Lines
dcpy/lifecycle/scripts/version_compare.py	53.03%	29 Missing and 2 partials ⚠️

Additional details and impacted files

Files with missing lines	Coverage Δ
dcpy/lifecycle/scripts/version_compare.py	`41.74% <53.03%> (+41.74%)`	⬆️

... and 4 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

fvankrieken · 2025-11-24T21:02:16Z

dcpy/lifecycle/scripts/version_compare.py

+        if not self.original or not other.original:
+            return False
+
+        # Direct string comparison (handles case differences)
+        if self.original.lower().strip() == other.original.lower().strip():
+            return True
+
+        # Compare normalized versions
+        return self.normalized == other.normalized


Just thinking about trying to trim down a little...

Could we declare original and normalized as typed class attributes? I don't see how this first if clause would ever be met.

Then, since we're not returning "exact match" or the like, just true either way, should we just compared the normalized versions? We've already computed them and they should be identical if the originals were identical. Meaning we could skip the second if clause as well.

Yeah, I'm with you. Made it terse. For the first case (if you're referring to if not self.original or not other.original) it'd be when it resolves to None == None which want to be False.

I got bad news bucko

Wow my reading comprehension is terrible today sorry

fvankrieken · 2025-11-24T21:03:08Z

dcpy/lifecycle/scripts/version_compare.py

+        version = self.original.lower().strip()
+
+        # Handle quarter notation (e.g., "25q1", "24q2")
+        quarter_match = re.match(r"^(\d{2})q([1-4])$", version)


can this be q|Q? I feel like we see both

Sorry! We've lowered at this point

indeed - though latest commit adds an explicit test case for that (there was something close before, but not quite) and changes the comment

damonmcc · 2025-11-24T22:07:51Z

dcpy/test/lifecycle/scripts/test_version_comparison.py

+            ("September 2025", "202509", True),
+            ("JUNE 2024", "24q2", True),
+            ("march 2025", "20250315", True),
+            ("Q1 2025", "january 2025", False),  # Different months in Q1


sorry why aren't these probably equal?

edit: oh direction matters! if something is Q1 it could be any of 3 different months so we'd rather not say these're equalish

alexrichey force-pushed the ar-fuzzy-versions branch 3 times, most recently from 5e602fc to 86a3045 Compare November 24, 2025 18:59

Add Fuzzy Versions, to compare bytes <> socrata

dacb0ed

alexrichey force-pushed the ar-fuzzy-versions branch from 86a3045 to dacb0ed Compare November 24, 2025 19:08

alexrichey requested review from damonmcc and fvankrieken and removed request for damonmcc and fvankrieken November 24, 2025 20:47

alexrichey assigned damonmcc and fvankrieken Nov 24, 2025

alexrichey marked this pull request as ready for review November 24, 2025 20:52

fvankrieken reviewed Nov 24, 2025

View reviewed changes

alexrichey added 3 commits November 24, 2025 16:22

post-review: remove the blub in probably_equals

fa59777

make quarter notation a little more explicit

ee26301

Fix ruff issue

65884dc

fvankrieken approved these changes Nov 24, 2025

View reviewed changes

alexrichey merged commit c4e9c7c into main Nov 24, 2025
25 checks passed

alexrichey deleted the ar-fuzzy-versions branch November 24, 2025 21:42

damonmcc reviewed Nov 24, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Fuzzy Versions, to compare bytes <> socrata #2061

Add Fuzzy Versions, to compare bytes <> socrata #2061

Uh oh!

alexrichey commented Nov 13, 2025 •

edited

Loading

Uh oh!

codecov bot commented Nov 13, 2025 •

edited

Loading

Uh oh!

fvankrieken Nov 24, 2025

Uh oh!

alexrichey Nov 24, 2025

Uh oh!

fvankrieken Nov 24, 2025

Uh oh!

fvankrieken Nov 24, 2025

Uh oh!

fvankrieken Nov 24, 2025

Uh oh!

fvankrieken Nov 24, 2025

Uh oh!

alexrichey Nov 24, 2025

Uh oh!

Uh oh!

damonmcc Nov 24, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Add Fuzzy Versions, to compare bytes <> socrata #2061

Add Fuzzy Versions, to compare bytes <> socrata #2061

Uh oh!

Conversation

alexrichey commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

fvankrieken Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

alexrichey Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

fvankrieken Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

fvankrieken Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

fvankrieken Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

fvankrieken Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

alexrichey Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

damonmcc Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

alexrichey commented Nov 13, 2025 •

edited

Loading

codecov bot commented Nov 13, 2025 •

edited

Loading

damonmcc Nov 24, 2025 •

edited

Loading