Skip to content

PS-10231 [9.x]: Fix DBLWR recovery tests for compressed+encrypted pages#5844

Merged
inikep merged 1 commit intopercona:trunkfrom
inikep:PS-10231-9.x
Mar 2, 2026
Merged

PS-10231 [9.x]: Fix DBLWR recovery tests for compressed+encrypted pages#5844
inikep merged 1 commit intopercona:trunkfrom
inikep:PS-10231-9.x

Conversation

@inikep
Copy link
Collaborator

@inikep inikep commented Feb 26, 2026

The innodb.dblwr_lz4_encrypt_recv and innodb.dblwr_zlib_encrypt_recv tests were failing because the DBLWR copy of the test table's root page was being overwritten by background flush activity (undo purge, system tablespace) between FLUSH TABLES FOR EXPORT and SIGKILL. This race became more likely after Bug#37684656 reduced the DBLWR buffer size.

Additionally, without pending redo records for the test tablespace after checkpoint, crash recovery never opened it, so the per-space DBLWR recovery path never executed.

Restructure both tests to follow the robust pattern used by innodb.dblwr_encrypt_recover:

  • Wait for purge to complete before flushing
  • Disable master thread and checkpoint after flush to prevent background DBLWR slot reuse
  • Perform an uncommitted INSERT to generate pending redo records, ensuring the tablespace is opened during crash recovery
  • Kill the server first, then corrupt the page externally while the server is down (guaranteeing the DBLWR copy survives)
  • Zero the entire page (ALL_ZEROES=1) because for compressed pages with punch hole, the second half is already zeros so partial corruption has no effect
  • Add master.opt with --innodb_doublewrite_pages=512 for extra margin

The innodb.dblwr_lz4_encrypt_recv and innodb.dblwr_zlib_encrypt_recv
tests were failing because the DBLWR copy of the test table's root page
was being overwritten by background flush activity (undo purge, system
tablespace) between FLUSH TABLES FOR EXPORT and SIGKILL. This race
became more likely after Bug#37684656 reduced the DBLWR buffer size.

Additionally, without pending redo records for the test tablespace after
checkpoint, crash recovery never opened it, so the per-space DBLWR
recovery path never executed.

Restructure both tests to follow the robust pattern used by
innodb.dblwr_encrypt_recover:
- Wait for purge to complete before flushing
- Disable master thread and checkpoint after flush to prevent
  background DBLWR slot reuse
- Perform an uncommitted INSERT to generate pending redo records,
  ensuring the tablespace is opened during crash recovery
- Kill the server first, then corrupt the page externally while
  the server is down (guaranteeing the DBLWR copy survives)
- Zero the entire page (ALL_ZEROES=1) because for compressed pages
  with punch hole, the second half is already zeros so partial
  corruption has no effect
- Add master.opt with --innodb_doublewrite_pages=512 for extra margin
@inikep
Copy link
Collaborator Author

inikep commented Feb 26, 2026

Copy link
Collaborator

@percona-ysorokin percona-ysorokin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@inikep inikep merged commit 0fa8790 into percona:trunk Mar 2, 2026
3 of 6 checks passed
@inikep inikep deleted the PS-10231-9.x branch March 2, 2026 14:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants