release-25.4: workload,roachtest: revive TPCH usage #155740

yuzefovich · 2025-10-20T23:30:11Z

Backport 6/6 commits from #155135.
Backport 1/1 commits from #156002.

/cc @cockroachdb/release

This PR contains several commits that revive our usage of TPCH dataset. Most of roachtests are now switched to using workload fixtures import command. tpch workload itself has been adjusted to no longer hard-code "expected rows" for the "golden" TPCH backup. See each commit for details.

Fixes: #153489.
Fixes: #153566.
Fixes: #154173.

Release justification: effectively test-only change.

blathers-crl · 2025-10-20T23:30:16Z

Thanks for opening a backport.

Before merging, please confirm that the change does not break backwards compatibility and otherwise complies with the backport policy. Include a brief release justification in the PR description explaining why the backport is appropriate. All backports must be reviewed by the TL for the owning area. While the stricter LTS policy does not yet apply, please exercise judgment and consider gating non-critical changes behind a disabled-by-default feature flag when appropriate.

blathers-crl · 2025-10-20T23:30:18Z

Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks.

_{🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.}

cockroach-teamcity · 2025-10-20T23:30:42Z

This change is

I don't think we need `vectorize` and `default-vectorize` flags anymore (they were introduced long time ago when the vectorized engine had been under active development). Additionally, we forgot to remove the last mention of already removed `distsql` flag. Release note: None

`enable-checks` flag of TPCH workload allows for checking actual query results against the hard-coded set of expected results (which were obtained against the "golden" TPCH backup about 6 years ago). In order to avoid hard-coding thousands of rows, for queries 11 and 16 we allowed only checking the row count while all other queries received exact comparison check. However, in 852a2b3 we broke this logic: namely, during the PR review, the code was refactored in such a way as to include the expected row count for each query into the map, and as a result we stopped doing the exact comparison check (since each query had a number in `numExpectedRowsByQueryNumber` map) and only did the row count check for all queries. This commit fixes that oversight - we now only include queries 11 and 16 into the map, so all others get the exact comparison check. Release note: None

Before the fix in the previous commit, whenever `enable-checks` flag was specified, we only did the row count check (i.e. verified that a TPCH query returned the expected number of rows). I briefly looked into whether we could make the exact comparison check somewhat universal (as opposed to only work against the "golden" TPCH backup (which we can no longer restore)) but decided it's not worth the effort. Given that we've lived with row count check only for many years now, this commit embraces the effect of that bug, so it refactors the `tpch` workload to only do the row count check (for 17 out of 22 queries for which the number of rows returned doesn't seem to change due to a different fixture seed or scale factor). As a small win, this check is now done unconditionally, so `enable-checks` flag is now removed. This patch will allow us to switch to using `import fixtures` command instead of trying to restore the "golden" TPCH backup. Release note: None

When we implemented tpch dataload, we copied the logic to add FKs NOT VALID (in order to speed up the test) over from tpcc. I don't think we are as concerned about the performance of the initial ingestion of tpch dataset, so this commit makes these ALTER stmts add FKs validated right away. In my local reproduction when importing Scale Factor 1, it took about 1 minute for the data ingestion and on the order of 40s for FK validation, which seems acceptable. On the plus side, now imported fixtures match the schema that we got with previously restoring the "golden" TPCH backup. Additionally, one of the FKs was commented out due to an error, but we're not hitting that anymore, so it's now also added. Furthermore, 3 spots where we hard-code the TPCH schema have been updated to include validated FKs as well as the commented out one. Release note: None

tpcc, tpch, and movr workloads add foreign keys, and since PostLoad step must be idempotent, they need to ignore errors when a FK already exists. Previously, we used the old error message, and this commit fixes that oversight. Release note: None

For a long time we've been using the "golden" TPCH backup which can no longer be restored (since it was created 5+ years ago). Given the change in how we do "checks" in TPCH workload, we can now switch to using fixtures easily. Many roachtests owned by Queries are affected in addition to the schemachange test (which uses Scale Factor of 10) - it has now been refactored to use the same helper. The helper has been extended to also support secondary tenants. Additionally, I re-generated a new backup with 25.3 version for scale factor of 1 - this is now utilized only by sqlsmith roachtest (which wants to restore into `defaultdb` that doesn't seem to be easy with fixtures since they hard-code the database to be equal to the name of the fixture). tpcc and tpcds backups have also been regenerated. In terms of time difference, restoring the backup takes about 20s whereas importing the fixture with validated FKs (plus collecting table stats) takes about 2m 15s. This seems acceptable. I decided to remove `import-sf10.sql` script since it's effectively a copy of `sf1` and doesn't seem worth maintaining separately. Release note: None

When we added the fixtures generation of TPCH spec, we forgot a couple of values - one for "priorities" and one for "containers". This commit fixes that omission. In particular, missing priority value made us return 4 rows instead of 5 for Q4, and the expectation has been adjusted accordingly. Note that this omission was exposed via a roachtest that uses correctly generated fixtures of SF 100. Release note: None

blathers-crl bot added backport Label PR's that are backports to older release branches T-sql-queries SQL Queries Team labels Oct 20, 2025

yuzefovich force-pushed the backport25.4-155135 branch 2 times, most recently from e924a53 to 7b09b48 Compare October 23, 2025 23:30

yuzefovich added 7 commits October 24, 2025 10:24

yuzefovich force-pushed the backport25.4-155135 branch from 7b09b48 to dca5a9f Compare October 24, 2025 17:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

release-25.4: workload,roachtest: revive TPCH usage #155740

release-25.4: workload,roachtest: revive TPCH usage #155740

yuzefovich commented Oct 20, 2025 •

edited

Loading

Uh oh!

blathers-crl bot commented Oct 20, 2025

Uh oh!

blathers-crl bot commented Oct 20, 2025

Uh oh!

cockroach-teamcity commented Oct 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

release-25.4: workload,roachtest: revive TPCH usage #155740

Are you sure you want to change the base?

release-25.4: workload,roachtest: revive TPCH usage #155740

Conversation

yuzefovich commented Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

blathers-crl bot commented Oct 20, 2025

Uh oh!

blathers-crl bot commented Oct 20, 2025

Uh oh!

cockroach-teamcity commented Oct 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yuzefovich commented Oct 20, 2025 •

edited

Loading