MDEV-39694 innodb_log_archive=ON recovery fails if no tablespaces exist#5108
MDEV-39694 innodb_log_archive=ON recovery fails if no tablespaces exist#5108dr-m wants to merge 3 commits into
Conversation
… exist srv_undo_tablespaces_init(): Remove the redundant parameter create_new_db. recv_recovery_from_checkpoint_start(): Remove a redundant check. srv_start_recovery(): Wrapper for the crash recovery code path. TODO: Open the system&undo tablespaces deeper inside this code! srv_start(): Check create_new_db fewer times.
|
|
There was a problem hiding this comment.
Code Review
This pull request refactors the InnoDB startup and recovery logic by extracting recovery procedures into a new srv_start_recovery function and simplifying the srv_undo_tablespaces_init interface. Feedback suggests refactoring the srv_start_recovery logic to avoid a non-idiomatic goto statement and ensuring that specific error codes are returned instead of a generic DB_ERROR during undo tablespace initialization.
| if (srv_operation != SRV_OPERATION_RESTORE) | ||
| { | ||
| err= srv_load_tables(must_upgrade_ibuf); | ||
| if (err == DB_SUCCESS) | ||
| init_lists: | ||
| err= trx_lists_init_at_db_start(); | ||
| if (err == DB_SUCCESS && recv_needed_recovery) | ||
| trx_sys_print_mysql_binlog_offset(); | ||
| } | ||
| else if (recv_needed_recovery) | ||
| goto init_lists; |
There was a problem hiding this comment.
The use of 'goto' to jump into the middle of an 'if' block is non-idiomatic and makes the code harder to maintain. It also relies on a label being placed between an 'if' statement and its body, which is confusing. This logic can be refactored to be much clearer and safer without using 'goto'.
if (srv_operation != SRV_OPERATION_RESTORE) {
err = srv_load_tables(must_upgrade_ibuf);
if (err != DB_SUCCESS) {
return err;
}
}
if (srv_operation != SRV_OPERATION_RESTORE || recv_needed_recovery) {
err = trx_lists_init_at_db_start();
if (err == DB_SUCCESS && recv_needed_recovery) {
trx_sys_print_mysql_binlog_offset();
}
}| return(srv_init_abort(err)); | ||
| } | ||
| if (err != DB_SUCCESS) { | ||
| return srv_init_abort(DB_ERROR); |
There was a problem hiding this comment.
| } else { | ||
| /* We always try to do a recovery, even if the | ||
| database had been shut down normally */ | ||
| log_sys.latch.wr_lock(); | ||
| err = recv_sys.find_checkpoint(); | ||
| log_sys.latch.wr_unlock(); |
There was a problem hiding this comment.
I tried the following revision, moving the redo log processing before the point where the system and undo tablespaces are opened:
diff --git a/storage/innobase/srv/srv0start.cc b/storage/innobase/srv/srv0start.cc
index 67eeaa5ff00..33351f4de10 100644
--- a/storage/innobase/srv/srv0start.cc
+++ b/storage/innobase/srv/srv0start.cc
@@ -1579,6 +1579,10 @@ dberr_t srv_start(bool create_new_db)
err = recv_sys.find_checkpoint();
log_sys.latch.wr_unlock();
+ if (err == DB_SUCCESS) {
+ err = srv_start_recovery();
+ }
+
if (err != DB_SUCCESS) {
return srv_init_abort(err);
}
@@ -1601,10 +1605,6 @@ dberr_t srv_start(bool create_new_db)
if (err == DB_SUCCESS) {
err = srv_undo_tablespaces_init(nullptr);
- if (err == DB_SUCCESS) {
- err = srv_start_recovery();
- }
-
if (err != DB_SUCCESS) {
return srv_init_abort(err);
}I tested the following:
--source include/have_innodb.inc
let bugdir= $MYSQLTEST_VARDIR/tmp/log_archive_recovery;
--mkdir $bugdir
--let $dirs= --innodb-data-home-dir=$bugdir --innodb-log-group-home-dir=$bugdir
--let $dirs=$dirs --innodb-undo-directory=$bugdir
--let $restart_parameters= $dirs --innodb-log-archive --innodb-undo-tablespaces=3 --innodb-log-recovery-start=12288
--source include/restart_mysqld.inc
CREATE TABLE t(a INT)ENGINE=InnoDB;
--source include/shutdown_mysqld.inc
--remove_file $bugdir/ibdata1
--source include/start_mysqld.inc
--error ER_UNKNOWN_STORAGE_ENGINE
SELECT * FROM t;
--write_file $bugdir/ibdata1
EOF
--source include/restart_mysqld.inc
SELECT * FROM t;At the last line, we’d get ER_UNKNOWN_STORAGE_ENGINE because InnoDB does not know about the undo tablespaces (tablespaces 1 through 3):
2026-05-22 8:08:35 0 [Note] InnoDB: Completed initialization of buffer pool
2026-05-22 8:08:35 0 [ERROR] InnoDB: Missing FILE_DELETE or FILE_MODIFY for [page id: space=1, page number=0] at 13314; set innodb_force_recovery=1 to ignore the record.
2026-05-22 8:08:35 0 [ERROR] InnoDB: Missing FILE_DELETE or FILE_MODIFY for [page id: space=1, page number=0] at 17748; set innodb_force_recovery=1 to ignore the record.
2026-05-22 8:08:35 0 [Note] InnoDB: End of log at LSN=17748
2026-05-22 8:08:35 0 [ERROR] InnoDB: Log scan aborted at LSN 17748
2026-05-22 8:08:35 0 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[1587] with error Generic error
2026-05-22 8:08:35 0 [Note] InnoDB: Starting shutdown...
2026-05-22 8:08:35 0 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
By convention, we do not write FILE_ records for system or undo tablespace files. It would make sense to write them for undo tablespaces, because doing so would address the known problem that archive recovery is broken when the number of innodb_undo_tablespaces has been being changed after the time of innodb_log_recovery_start. If I run the test innodb.undo_upgrade_debug with innodb_log_recovery_start=12288 and innodb_log_archive=ON, it will report many errors like this:
2026-05-22 8:33:24 0 [ERROR] InnoDB: Missing FILE_DELETE or FILE_MODIFY for [page id: space=1, page number=0] at 13314; set innodb_force_recovery=1 to ignore the record.
2026-05-22 8:33:24 0 [ERROR] InnoDB: Missing FILE_DELETE or FILE_MODIFY for [page id: space=1, page number=0] at 17756; set innodb_force_recovery=1 to ignore the record.
2026-05-22 8:33:24 0 [ERROR] InnoDB: Log scan aborted at LSN 17756
Writing FILE_CREATE and FILE_MODIFY records for the undo* files should fix this.
The system tablespace can be handled as a special case, because it always carries the tablespace identifier 0.
There was a problem hiding this comment.
When the function srv_undo_tablespaces_init() is creating undo tablespaces, it creates all of them within a single mini-transaction. That has to be handled as a special case. For normal operation, I would introduce a data member mtr_t::m_undo_space.
For backward compatibility, we will not change the logging when innodb_log_archive=OFF TODO: Fix some tests that are now being disabled.
srv_undo_tablespaces_init(): Remove the redundant parametercreate_new_db.recv_recovery_from_checkpoint_start(): Remove a redundant check.srv_start_recovery(): Wrapper for the crash recovery code path.TODO: Open the system&undo tablespaces deeper inside this code!
srv_start(): Check create_new_db fewer times.