Skip to content

Introduce option to setup transaction before executing queries #3471

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 19 commits into
base: main
Choose a base branch
from

Conversation

ScottDugas
Copy link
Collaborator

@ScottDugas ScottDugas commented Jul 7, 2025

This adds the ability to specific a transaction setup to yamsql.
More specifically for an individual query you can have: setup: which will be executed at the start of every transaction for that query

To avoid writing the same setup repeatedly, it also adds a new block: transaction_setups: which allows you to create references for the setups. Inside the query, it can now be referenced via the key, using setupReference:.

This can be useful if you want to test with temporary functions, but still have all the benefits of randomized, and forced-continuations.

This currently only supports a single query in the setup, but it wouldn't be hard to extend it to take an array.

Introduces a setup query config to have code executed transactionally
with the query itself.
Also, a way to define the setup in one place, and reuse it.
Also, fix some issues with transacitonal code.

This gets it working for embedded and jdbc-in-process
I also added an ExternalSingleServerConfig to help myself debug,
and figured it was worth keeping.
create table t1(id bigint, col1 bigint, primary key(id))
create table table_t1(id bigint, col1 bigint, primary key(id))
---
transaction_setups:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked into using yaml's repeated nodes feature, but since each test_block is a separate "document" you can't really do that, so instead I'm adding our own reference section.

create table table_t1(id bigint, col1 bigint, primary key(id))
---
transaction_setups:
t1_less_than_50: create temporary function t1() on commit drop function AS SELECT * FROM table_t1 where id < 50
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory, it wouldn't be hard to change this to allow an array of strings if there's a bunch of setup steps

@ScottDugas ScottDugas added the testing improvement Change that improves our testing label Jul 7, 2025
@ScottDugas ScottDugas changed the title Yaml transactions Introduce option to setup transaction before executing queries Jul 14, 2025
Comment on lines +56 to +57
maxRows:;initialVersionAtLeast:;initialVersionLessThan:;;mode:;repetition:;seed:;setup:;transaction_setups:;
setupReference:;check_cache:;connection_lifecycle:;steps:;preset:;statement_type:;!r;!in;!a;" />
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We seem pretty incosistent in general about whether things should be snake_case or camelCase

@ScottDugas ScottDugas requested a review from g31pranjal July 14, 2025 21:57
@ScottDugas ScottDugas marked this pull request as ready for review July 14, 2025 21:58
@ScottDugas ScottDugas marked this pull request as draft July 15, 2025 14:38
The bug was fixed in 4.4.7.0 so this no longer needs to be
!current_version
@ScottDugas ScottDugas marked this pull request as ready for review July 15, 2025 14:48
Copy link
Member

@g31pranjal g31pranjal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember that you introduced some unit tests to support the supported_version functionality. Will it be good to have some like those that could check for disallowed yaml layouts here, like having setups before query and key collison.

Comment on lines 210 to 213
} else if (QueryConfig.QUERY_CONFIG_SETUP.equals(queryConfig.getConfigName())) {
Assert.that(!queryIsRunning, "Transaction setup should not be intermingled with query results");
executor.addSetup(Matchers.string(Matchers.notNull(queryConfig.getVal(), "Setup Config Val"),
"Transaction setup"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems to be some merit in enforcing "setup" configs to always appear consecutively and directly above "query"? I suppose that will be more visibly clear given what "setup" is supposed to do.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could do that, but the way we figure out what Command it is, is by the first entry, so we would have to change that logic in order to support putting the setup first. I feel like that's a bit more refactoring than is worth bringing in here.
It's also possible that we're pushing up against the limits of the original design, and changing it to be a map might make more sense, but that would be a pretty big change.
One other way to think about it is that the query itself is the test name, and thus should go first.

Comment on lines +206 to +207
final RelationalStatement statement = singleConnection.createStatement();
statement.execute(setupStatement);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, while I see this to be working well for creating temporary functions, I am a unclear on the scope of SQL statements that can be used in "setup". For instance, UPDATE .... RETURNING ... won't work here as it requires slurping the result set. Now that I think of it, DQL works fine and so does DML except the case I mentioned above.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should either,

  1. restrict the scope of statements here (maybe to only CREATE TEMPORARY FUNCTION ....) by some sort of patern matching, or we can error if the execute returns a ResultSet (barring DQL completely).
  2. or, we could softly specify in documentation about its specific usages and cautions.

Open to your thoughts on this!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's a good callout.
We currently commit at the end of the query (maybe that's a mistake), and so having other DDL would cause problems. I forbid it for now, and put a comment there that this might not stay if we can figure out how we want to use it in other situations.

Comment on lines +52 to +60
- setup: create temporary function t1() on commit drop function AS
SELECT * FROM table_t1 where id < 30;
- result: []
-
# This query references the transaction setup from above. This behaves exactly the same as the inline version
# it just allows easier use of the same setup for many queries.
- query: select * from t1 where id > 15;
- setupReference: t1_less_than_50
- result: [{id: 30, col1: 40}]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will go to add cases here to showcase/tests that utilizes multiple setup configs (looks like that is supported!) and that the order of execution is preserved!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another case could be to have a mix of setup and setupReference intermingled

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's a good callout. I added a bunch of tests.

# The map is basically from a name (e.g. t1_less_than_50) to a statement to run at the start of every query.
# They're referenced in the query using `setupReference`. Right now it only supports a single statement as a string, but
# it could be expanded to support an array in the future.
transaction_setups:
Copy link
Member

@g31pranjal g31pranjal Jul 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems like multiple transaction_setups are allowed, so they can be spread all over the file. However, since all of them are processed in parsing, the lifecycle of all the kv pairs is entire runtime. I think it will be good to disallow that or have an example citing this behavior.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears to work as expected; I added some tests that cover this, and show that only test_blocks after the transaction_setup can refer to it. It doesn't make a lot of sense if you have just tests, but if you have some tests, some setup, and then some transaction_setups it may make more sense.

@@ -0,0 +1,30 @@
/*
* SQLSupplier.java
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

name

Some of these test checks that were suggested in the PR, but not
yet implemented
setup could be used for inserts, but it may also be used for
installing schema via metadata. Everything except test blocks
are executed when they are hit, so this seems logical
It does do what you would expect in terms of registering
Other operations are hard to reason about, especially with parallelism,
so don't allow it, at least for now.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
testing improvement Change that improves our testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants