-
Notifications
You must be signed in to change notification settings - Fork 536
datastore: apply schema changes immediately to committed state. #2685
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces immediate application of schema changes to the committed datastore state and updates various index and schema manipulation APIs. Key changes include the addition of new merging functions (can_merge) for several index types, revised schema mutation and index deletion APIs in Table, and significant restructuring of transaction state and rollback mechanisms in the core datastore layer.
Reviewed Changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated no comments.
Show a summary per file
File | Description |
---|---|
crates/table/src/table_index/uniquemap.rs | Added a can_merge method to verify if two UniqueMaps can be merged |
crates/table/src/table_index/unique_direct_index.rs | Added a can_merge method for unique direct indices using a zip iteration over inner collections |
crates/table/src/table_index/unique_direct_fixed_cap_index.rs | Added a can_merge function to similar effect for fixed capacity indices |
crates/table/src/table_index/mod.rs | Introduced a could_merge method for TableIndex that dispatches to the appropriate can_merge functions |
crates/table/src/table.rs | Modified schema mutation functions and index add/delete operations with updated pointer map handling |
crates/schema/src/schema.rs | Revised adjacent schema modifications using a new take_adjacent_schemas API and helper find_remove |
crates/core/src/db/datastore/locking_tx_datastore/* | Updated transaction state and rollback handling to immediately reflect schema changes, plus various related API adjustments |
Comments suppressed due to low confidence (3)
crates/table/src/table_index/uniquemap.rs:87
- Add unit tests for the new can_merge method to verify its behavior when merging maps that have conflicting keys as well as when no conflicts are present.
pub(crate) fn can_merge(&self, other: &UniqueMap<K, V>) -> Result<(), &V> {
crates/table/src/table_index/unique_direct_index.rs:210
- Ensure that unit tests are added for can_merge to cover scenarios where inner slot conflicts occur and where merge should succeed without errors.
pub(crate) fn can_merge(&self, other: &UniqueDirectIndex) -> Result<(), RowPointer> {
crates/table/src/table_index/unique_direct_fixed_cap_index.rs:126
- Introduce tests for the new can_merge function to confirm that the function correctly identifies merge conflicts when both slots are occupied.
pub(crate) fn can_merge(&self, other: &UniqueDirectFixedCapIndex) -> Result<(), RowPointer> {
a3ecb73
to
3661d73
Compare
This comment was marked as spam.
This comment was marked as spam.
b0f1860
to
fb73fe9
Compare
0f9277b
to
c278d8f
Compare
e70a32b
to
19ef0f5
Compare
…t_create_drop_sequence_transactionality
19ef0f5
to
0de8fcf
Compare
Improved the PR description. |
Description of Changes
The general idea of this PR is to apply the in-memory changes due to schema changes such as
create_table
anddrop_index
directly to the committed state. By doing so, we make the schema changing datastore APIs care about that complexity and remove (substantial) complexity from index scans, inserts, updates, etc. In the event of a rollback, e.g., due to a failed transaction, we revert these changes to the committed state by using undo state that we've stored intx_state.pending_schema_changes
. These undos are designed to be very cheap and avoid re-computing things like dropped indices. By re-architecting schema changes this way it becomes easy to make schema changes transactional, and so we fix that in this PR.This PR also fully leverages having new indices immediately in the committed state by assuming that if a row pointer occurs in an index, it exists in its table. That means we can get a
RowRef
unchecked. Moreover, we unify the infrastructure of scans in the PR and improve code sharing, deleting a bunch of unnecessary custom code in the process.The PR also substantially simplifies insert and update logic, finally sharing a bunch of code between them.
Fixes #865.
API and ABI breaking changes
None
Expected complexity level and risk
4, this is isolated to the datastore, but the datastore is critical, subtle, and contains various unsafe.
Testing
Existing tests are amended to assert aspects about schema change transactionality.
New tests
test_create_drop_sequence_transactionality
,test_drop_table_is_transactional
, andtest_create_table_is_transactional
are added.Benchmarks
Running the 10^6 updates on
master
(i7-7700K, 64GB RAM)Running the 10^6 updates on
centril/datastore-revamp-schema-changes
Running the subscriptions benchmark (baseline is master):
There is some noise in this benchmark, but -8% on
footprint-semijoin
at least shows up consistently and seems real.