Skip to content

fix(mito): use 1day as default time partition duration #6202

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jun 8, 2025

Conversation

v0y4g3r
Copy link
Contributor

@v0y4g3r v0y4g3r commented May 28, 2025

I hereby agree to the terms of the GreptimeDB CLA.

Refer to a related PR or issue link (optional)

What's changed and what's your intention?

This PR changes the default partition duration from infinite to 1day.

TimeParitions aims to split inserted data according to a "part_duration". Originally the part_duration is not present, so all rows are written to a default time partition without time range. When the first compaction is triggered, GreptiemDB will infer a proper time window according to flushed files.

However this would cause problem when the initial data covers a very large time span, for example, 10 years. Then we would end up with an SST file covers a large time span, which would further make it overlap with possibly every new SST files.

In this PR, we make 1day the default part_duration when there's no compaction time window specified while creating tables. This default value should suit most cases.

PR Checklist

Please convert it to a draft if some of the following conditions are not met.

  • I have written the necessary rustdoc comments.
  • I have added the necessary unit tests and integration tests.
  • This PR requires documentation updates.
  • API changes are backward compatible.
  • Schema or data changes are backward compatible.

@v0y4g3r v0y4g3r requested review from evenyag, waynexia and a team as code owners May 28, 2025 09:37
@github-actions github-actions bot added the docs-not-required This change does not impact docs. label May 28, 2025
v0y4g3r added 6 commits June 3, 2025 07:05
 ## Add Helper Functions and Enhance Compaction Tests

 - **Refactor Compaction Logic**: Introduced helper functions `flush` and `compact` in `compaction_test.rs` to streamline compaction operations.
 - **Enhance Compaction Tests**: Added a new test `test_infer_compaction_time_window` in `compaction_test.rs` to verify compaction time window inference.
 - **Testing Improvements**: Added `#[cfg(test)]` attribute to `new_multi_partitions` in `time_partition.rs` to ensure it's only included in test builds.
 - **Refactor `TimePartition` Struct**: Removed unnecessary comments regarding `time_range` in `time_partition.rs`.
 - **Enhance `TimePartitions` Functionality**: Added a method `part_duration_or_default` to provide a default partition duration in `time_partition.rs`.
 - **Update SQL Test Cases**: Modified SQL operations and expected results in `scan_big_varchar.result` and `scan_big_varchar.sql` to reflect changes in data manipulation logic.
 ### Update Time Partition Default Duration

 - **Refactor Default Duration**: Introduced `INITIAL_TIME_WINDOW` constant to define the default time window duration as `Duration::from_days(1)`. This change replaces multiple instances of the hardcoded default duration across the `time_partition.rs` file.
 - **Files Affected**: `time_partition.rs`
 ## Update Partition Duration Handling

 - **`time_partition.rs`**: Refactored `part_duration` to be non-optional, removing `Option` wrapper. Updated logic to use `unwrap_or` with `INITIAL_TIME_WINDOW` where necessary. Adjusted related methods and tests to accommodate this change.
 - **`version.rs` (memtable and region)**: Updated handling of `part_duration` to align with changes in `time_partition.rs`, ensuring consistent use of non-optional `Duration`.
@v0y4g3r v0y4g3r force-pushed the fix/default-time-window branch from 911482e to 47abc22 Compare June 3, 2025 08:04
@v0y4g3r v0y4g3r requested review from evenyag and fengjiachun June 3, 2025 08:09
Copy link
Collaborator

@fengjiachun fengjiachun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

 ### Improve Error Context in `time_partition.rs`

 - Enhanced error context message in `time_partition.rs` to provide clearer information on partition time range issues, including bucket size details.

Signed-off-by: Lei, HUANG <[email protected]>
@v0y4g3r v0y4g3r enabled auto-merge June 8, 2025 15:53
@v0y4g3r v0y4g3r added this pull request to the merge queue Jun 8, 2025
Merged via the queue into GreptimeTeam:main with commit 69870e2 Jun 8, 2025
39 checks passed
@v0y4g3r v0y4g3r deleted the fix/default-time-window branch June 8, 2025 16:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs-not-required This change does not impact docs.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optimize time window inferring logic
3 participants