Skip to content

improvement: rework Store.toSQL function to minimize UPDATE statements #21

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 18 commits into
base: main
Choose a base branch
from

Conversation

jgoux
Copy link
Contributor

@jgoux jgoux commented Mar 6, 2024

We have a user who encountered a "bug" because we split our insertions between INSERT and UPDATE statements.

This PR's goal is to use as many INSERT statements as we can. UPDATE statements should only be used if we encounter a circular dependency.

The strategy here is to follow the rows structure in order to generate the statements.

For a given row, we look for parents, and if there are, we recursively (sorry @avallete 😂) create those parents statements.
It ensures that we will always create the rows in the right order, so we can get rid of the UPDATE.

Context: https://discord.com/channels/788353076129038346/1214886051886268426

I also updated the dependencies and rewrote some imports, I commented on the interesting files so it's easier to follow. 👍

@jgoux jgoux changed the title improvement: rework Store.toSQL function improvement: rework Store.toSQL function to minimize UPDATE statements Mar 6, 2024
@jgoux jgoux marked this pull request as ready for review March 7, 2024 17:04
@@ -18,9 +40,230 @@ export abstract class StoreBase implements Store {
);
}

protected _toSQL(props: {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the relevant part with the algorithm.

Comment on lines +12 to +16
return this._toSQL({
serializeInsertStatement,
serializeUpdateStatement,
getEndStatements,
});
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dialect implementation is now slimmed down to serializing the insert and update statements! (+ providing optional "end statements", in Postgres' case, to fix the sequences).

@@ -253,5 +253,50 @@ describe.each(["postgresJs"] as const)("store: %s", (adapter) => {
]),
);
});
test("should handle circular references", async () => {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a simple case, but I'm sure @avallete will be able to provide more elaborated ones and break all the things! 😂

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

crack his fingers

@jgoux jgoux requested review from justinvdm and avallete March 11, 2024 08:12
import { type CodegenContext } from "#core/codegen/codegen.js";
import { type Shape, type TableShapePredictions } from "#trpc/shapes.js";
import { type CodegenContext } from "../../codegen/codegen.js";
import { shouldGenerateFieldValue } from "../../dataModel/shouldGenerateFieldValue.js";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jgoux bikeshed, but just checking if these .. paths are intentional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so, they live in the same "module" which is #core.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, so convention-wise, we should use ../../ if its in the same "module" (e.g. code in core/ importing other code in core/, and # if its in a different module (e.g. code in dialect/ importing code in core/)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but it's very informal, so I wouldn't obsess on it. It's just a way for me to keep layers of responsibility separated. If I see a "#" I know it's from another layer of responsibility.

// do we need to keep track of the exact chain of parents?
if (ctx.pending[parent.type].includes(parentRowId)) {
if (parent.isRequired) {
throw new Error(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question

Should I try to add tests case on this PR so we address that before merging ?

I wonder if the fact that we don't handle circular deps will decrease the UX for most of the users or not. Because before we were just appending some update and that was it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't handle non-nullable circular deps, which wouldn't be insertable anyway.
I don't think this comment is relevant anymore, I think the way we generate UPDATE statements and replace these nullable FK with NULL when necessary solves it. But you can try breaking it, of course, maybe I'm missing obvious cases!

avallete added 2 commits March 11, 2024 11:00
serializeUpdateStatement: ctx.serializeUpdateStatement,
});

ctx.insertStatements.push(insertStatement);
Copy link
Contributor

@justinvdm justinvdm Mar 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the idea that this line would be reached for parents first, then their children later? Such that we're always inserting parents before children, to avoid needing to later on update already inserted rows?

const insertStatements: Array<string> = [];
const updateStatements: Array<string> = [];

const sortedModels = sortModels(this.dataModel);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For what reason do we topologically sort the models? To make sure the parents are reached before other ancestors, or something?

].join(EOL),
);
} else {
updatableParents.push(parent);
Copy link
Contributor

@justinvdm justinvdm Mar 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this line only be reached if there is a cyclic dependency + there is at least one nullable relation in the chain causing the cycle?

foreign key (last_order_id) references "order"(id);

alter table product add constraint fk_first_order
foreign key (first_order_id) references "order"(id);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth also testing cycles where there are relations in-between? e.g. instead of order->customer->order and order->product->order, something like order->transaction->customer->order, or something

@justinvdm justinvdm self-requested a review March 11, 2024 10:40
Copy link
Contributor

@justinvdm justinvdm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just left a bunch of questions for my understanding, but looks great! Code makes sense (provided I understood it correctly :D but the questions should indicate that at least). Nice that we also were able to abstract out the bulk of the logic so it is not per-dialect. Great work.

avallete added 2 commits March 11, 2024 11:56
@jgoux jgoux marked this pull request as draft March 12, 2024 09:52
@jgoux
Copy link
Contributor Author

jgoux commented Mar 12, 2024

Reconverting the PR to draft as @avallete was able to destroy my implementation with more complicated tests!
I'll focus on this PR while I'm done with the CLI.
@justinvdm I'll address your comment then, as the implementation will evolve. 👍

@justinvdm
Copy link
Contributor

Reconverting the PR to draft as @avallete was able to destroy my implementation with more complicated tests! I'll focus on this PR while I'm done with the CLI. @justinvdm I'll address your comment then, as the implementation will evolve. 👍

Cool, but was really just me wanting to understand the approach. If the implementation changing though, then I'll just re-grok and re-ask when the PR is open again, so works for me.

@jgoux
Copy link
Contributor Author

jgoux commented Apr 4, 2024

I'd like us to consider using a more strategic approach:

  1. attempt to disable FK checks (doable in both pg and sqlite)
  2. if 1. isn't possible, using the approach in this PR
  3. if 2. fails because of cycles maybe we could fallback on the current behaviour?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants