-
Notifications
You must be signed in to change notification settings - Fork 16.3k
feat(datasets): add datetime format detection to dataset columns #36150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(datasets): add datetime format detection to dataset columns #36150
Conversation
Code Review Agent Run #ad70d8Actionable Suggestions - 0Additional Suggestions - 20
Review Details
Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've completed my review and didn't find any issues.
Files scanned
| File Path | Reviewed |
|---|---|
| superset/commands/dataset/refresh.py | ✅ |
| superset/migrations/versions/2025-11-17_19-03_a9c01ec10479_add_datetime_format_to_table_columns.py | ✅ |
| superset/datasets/datetime_format_detector.py | ✅ |
| superset/datasets/schemas.py | ✅ |
| superset/datasets/api.py | ✅ |
| superset/utils/core.py | ✅ |
| superset/connectors/sqla/models.py | ✅ |
| superset/config.py | ✅ |
Explore our documentation to understand the languages and file types we support and the files we ignore.
Check out our docs on how you can make Korbit work best for you and your team.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #36150 +/- ##
==========================================
+ Coverage 60.48% 67.99% +7.50%
==========================================
Files 1931 636 -1295
Lines 76236 46817 -29419
Branches 8568 5081 -3487
==========================================
- Hits 46114 31831 -14283
+ Misses 28017 13710 -14307
+ Partials 2105 1276 -829
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
e1e54c9 to
a86cdfe
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've completed my review and didn't find any issues.
Files scanned
| File Path | Reviewed |
|---|---|
| superset/commands/dataset/refresh.py | ✅ |
| superset/migrations/versions/2025-11-17_19-03_a9c01ec10479_add_datetime_format_to_table_columns.py | ✅ |
| superset/datasets/datetime_format_detector.py | ✅ |
| superset/datasets/schemas.py | ✅ |
| superset/datasets/api.py | ✅ |
| superset/utils/core.py | ✅ |
| superset/connectors/sqla/models.py | ✅ |
| superset/config.py | ✅ |
Explore our documentation to understand the languages and file types we support and the files we ignore.
Check out our docs on how you can make Korbit work best for you and your team.
Move datetime format detection from query-time to dataset configuration time to improve performance and reduce log noise. **Changes:** - Add `datetime_format` column to `table_columns` table - Create `DatetimeFormatDetector` service to detect and store formats - Update `RefreshDatasetCommand` to auto-detect formats on dataset sync - Modify `normalize_dttm_col` to accept pre-detected format mapping - Add API endpoint for manual format detection - Add configuration options for feature enablement - Include comprehensive unit tests **Benefits:** - Eliminates repeated format detection on every query - Reduces "Could not infer format" warnings in logs - Provides consistent datetime parsing across queries - Allows manual format override via API Addresses performance concerns raised in PR apache#35042
- Fix SQL LIMIT portability by using database.apply_limit_to_sql() instead of hard-coded LIMIT syntax (supports SQL Server, Oracle, etc.) - Add virtual dataset and expression column handling to skip unsupported cases - Fix test mocking to include engine context manager and apply_limit_to_sql - Add test coverage for virtual datasets and expression columns Note: format_map integration will be added to superset/models/helpers.py after rebase
Build format_map from detected datetime formats stored in dataset columns and pass it to normalize_dttm_col() so the detected formats are actually used during query execution, eliminating the need for runtime detection.
9c477a4 to
abd72ca
Compare
mistercrunch
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
…che#36150) Co-authored-by: Claude Code <[email protected]>
) Co-authored-by: Claude Code <[email protected]> (cherry picked from commit 06a8f4d)
SUMMARY
This PR moves datetime format detection from query-time to dataset configuration time, addressing performance concerns and log noise issues raised in PR #35042.
What changed:
datetime_formatcolumn to thetable_columnsdatabase table to persist detected formatsDatetimeFormatDetectorservice class that samples column data and detects datetime formats during dataset creation/refreshRefreshDatasetCommandto automatically detect and store formats when datasets are syncednormalize_dttm_col()utility to accept pre-detected format mappings, eliminating runtime detectionPOST /api/v1/dataset/<pk>/detect_datetime_formatsfor manual format detectionDATASET_AUTO_DETECT_DATETIME_FORMATSandDATETIME_FORMAT_DETECTION_SAMPLE_SIZEWhy this matters:
Technical approach:
quoted_namefor safe SQL identifier handlingBEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
N/A - Backend-only change
TESTING INSTRUCTIONS
Set up test database with datetime columns:
Add dataset to Superset:
test_datestableVerify format detection:
Test manual detection API:
Verify query performance:
Test unit tests:
ADDITIONAL INFORMATION
POST /api/v1/dataset/<pk>/detect_datetime_formatsDATASET_AUTO_DETECT_DATETIME_FORMATS,DATETIME_FORMAT_DETECTION_SAMPLE_SIZERelated work: