Skip to content

refactor(move-tables): clean up prototype phase 1.2 skip-tables (#8206)#1700

Open
womoruyi wants to merge 7 commits into
move-tables/1.2-skip-ghost-tablesfrom
womoruyi/move-tables-1.2-production
Open

refactor(move-tables): clean up prototype phase 1.2 skip-tables (#8206)#1700
womoruyi wants to merge 7 commits into
move-tables/1.2-skip-ghost-tablesfrom
womoruyi/move-tables-1.2-production

Conversation

@womoruyi
Copy link
Copy Markdown

@womoruyi womoruyi commented Jun 5, 2026

Productionize the prototype-hacked phase 1.2 implementation (skip ghost/changelog/heartbeat in move-tables mode) into clean, tested, reviewable code matching the design doc acceptance criteria. No behavior change for existing gh-ost users — move-tables mode itself is opt-in via the --move-tables flag.

What approach did you choose and why?

The prototype on move-tables/1.2-skip-ghost-tables wired phase 1.2 into the existing Migrate() machinery via inline IsMoveTablesMode() guards (see @chriskirkland's TODO(chriskirkland): work left to do block at migrator.go:772-779). It worked end-to-end but was not reviewable as #8206's deliverable. This PR productionizes it.

Scope (#8206 only — the skips):

  1. Replaced WriteChangelog bypass TODO with intentional comment. applier.go:756. The single low-level chokepoint is the right design — there are three callers in move-tables mode (ReadMigrationRangeValues, printStatus, finalCleanup) and gating each individually would scatter guards and drift when new callers are added. Comment now documents why the bypass lives where it does.

  2. Removed demo hack code. Four artifacts marked TEMPORARY / TODO(replace) by @chriskirkland for local demo:

    • simulateMoveTablesCutover() function (migrator.go:933-960)
    • TmpCutoverFilename polling loop in MoveTables() (migrator.go:889-916)
    • TmpCutoverFilename field on the MoveTables struct (context.go:282)
    • --target-cutover-filename flag (main.go:177)
    • The //TODO: cutover here comment at migrator.go:889 is preserved as the insertion point for #8209 (cooperative cutover orchestration).
  3. Downgraded per-chunk and per-event Info logging to Debugf. Three call sites in migrator.go that fired on every row-copy chunk or every binlog event, flooding operator output. The intentional Info logs ("Skipping stream of the changelog table", "Skipping throttling in move tables mode") remain at Info — operators should see those.

  4. Added tests asserting #8206's acceptance criteria. Each test maps to a specific design-doc requirement:

    • TestInitiateApplierMoveTablesMode_NoGhostOrChangelogTable — asserts no _<table>_gho or _<table>_ghc on source or target (criterion 1)
    • TestWriteChangelogNoOpInMoveTablesMode — asserts WriteChangelog is a no-op
    • TestInitiateStreamingMoveTablesMode_NoChangelogListener — asserts no listener registered for the changelog table
    • TestNoHeartbeatInMoveTablesMode — asserts InitiateHeartbeat goroutine is not started
    • TestFinalCleanupMoveTablesMode_SkipsDrops — asserts finalCleanup skips DropChangelogTable / DropGhostTable / DropOldTable
  5. Fixed test call sites for changed signatures. ReadMigrationRangeValues(*gosql.DB) and CalculateNextIterationRangeEndValues(*gosql.DB) changed signatures in the prototype to accept an explicit DB for cross-cluster reads; pre-existing tests pass nil to preserve standard-mode behavior.

Explicitly NOT in this PR (belongs to #8207):

  • The CreateTargetTable / createTargetTableFromStatement code remains as prototype-hacked in initiateApplier()'s move-tables branch. Productionizing it (clean error handling, abort-if-target-exists per #8207's acceptance criteria) is #8207's scope.
  • The GetTargetDatabaseName() inverted-branches bug at context.go:411 is documented but not fixed here. It affects prepareQueries(), which is #8207/§1.3 territory.

Which feature flags are involved in this change?

  • None. --move-tables mode itself is opt-in via CLI flag; existing gh-ost users on --alter are unaffected because every change in this PR is inside IsMoveTablesMode() branches.

Which environments does this change target?

  • N/A — gh-ost is an operator tool, not a service. Built and shipped via script/build.

Risk assessment

  • Low risk: No behavior change for existing --alter users; every modified code path is gated behind IsMoveTablesMode(). Move-tables mode itself is a POC opt-in via --move-tables, not yet a production path. The WriteChangelog comment swap and log-level downgrades are mechanical. The demo hack removal eliminates code labeled TEMPORARY by its author. New tests validate that the skips behave as specified.

How did/will you validate this change?

  • Tests — New unit/integration tests directly assert each of #8206's acceptance criteria. Existing tests (TestApplyDMLEventQueriesMoveTablesMode, TestApplyIterationMoveTableCopyQueries, builder tests) continue to pass.
  • Tests — Full go test ./go/logic/... -count=1 passes (with known pre-existing failures noted below).
  • Othergo build ./... clean. gofmt clean.
  • Other — Manual smoke: a --move-tables run against the local test bed (multi-cluster docker-compose in gh-ost-tablemove-poc) completes row copy without creating _gho or _ghc tables and exits cleanly. No "ghost table migrated", "changelog", or "heartbeat" strings appear in the run output (#8206 criterion 2).

Known pre-existing test failures (not introduced by this PR)

These tests fail on the base branch move-tables/1.2-skip-ghost-tables for the same root cause — the GetTargetDatabaseName() inverted-branches bug at context.go:411 returns an empty string in move-tables mode, producing malformed DML query strings (e.g. ._test_gho`` instead of ``test._test_gho` ``):

  • TestApplierBuildDMLEventQuery (delete / insert / update variants)
  • TestApplyDMLEventQueries
  • TestApplyDMLEventQueriesMoveTablesMode
  • Several PanicOnWarnings tests with the same root cause

This bug affects prepareQueries() which is #8207 / §1.3 scope (already noted in "Explicitly NOT in this PR" above). The fix belongs in the #8207 PR. To verify these failures are pre-existing, run go test ./go/logic/... -count=1 on the base branch move-tables/1.2-skip-ghost-tables before and after this PR — the same tests fail in both states.

Are there related full stack changes?

  • No — this is a refactor inside the gh-ost Go binary.

If something goes wrong, what are the mitigation and rollback strategies?

  • Rollback — Revert the merge commit. The change is contained to gh-ost itself, no external dependents. Reverting restores the prototype-hacked state on move-tables/1.2-skip-ghost-tables.
  • Operator workaround — If a move-tables run regresses, operators can simply not pass --move-tables; standard --alter migrations are completely unaffected.

Reviewers: This is the productionization of #8206 (skip ghost/changelog/heartbeat in move-tables mode). The target-table creation in initiateApplier()'s move-tables branch is left as prototype-hacked because that's #8207's scope — please flag if you think the boundary is wrong.

womoruyi added 2 commits June 5, 2026 19:04
- Replace WriteChangelog bypass TODO with intentional comment explaining
  the single-chokepoint design decision
- Remove demo hack code: simulateMoveTablesCutover(), TmpCutoverFilename
  field, --target-cutover-filename flag, and file-based cutover polling loop
- Downgrade per-chunk/per-event debug logging from Info to Debugf
- Fix pre-existing test call-sites for changed function signatures (nil args)
- Add tests for all #8206 acceptance criteria:
  - No ghost or changelog table created in move-tables mode
  - WriteChangelog is a no-op in move-tables mode
  - No changelog listener registered on the streamer
  - Heartbeat goroutine not started
  - finalCleanup skips drop operations
- Replace WriteChangelog bypass TODO with intentional comment explaining
  the single-chokepoint design decision
- Remove demo hack code: simulateMoveTablesCutover(), TmpCutoverFilename
  field, --target-cutover-filename flag, and file-based cutover polling loop
- Downgrade per-chunk/per-event debug logging from Info to Debugf
- Fix pre-existing test call-sites for changed function signatures (nil args)
- Fix test teardown to drop changelog table between tests
- Add tests for all #8206 acceptance criteria:
  - No ghost or changelog table created in move-tables mode
  - WriteChangelog is a no-op in move-tables mode
  - No changelog listener registered on the streamer
  - Heartbeat goroutine not started
  - finalCleanup skips drop operations
Copilot AI review requested due to automatic review settings June 5, 2026 19:20
@womoruyi womoruyi changed the base branch from master to move-tables/1.2-skip-ghost-tables June 5, 2026 19:23
@womoruyi womoruyi requested review from danieljoos and ericyan June 5, 2026 19:24
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors and “productionizes” the move-tables Phase 1.2 behavior in gh-ost, aiming to skip ghost/changelog/heartbeat machinery when running in --move-tables mode, while introducing new SQL builders and plumbing to copy data across clusters.

Changes:

  • Add move-tables-specific SQL query builders for chunked SELECT and multi-row INSERT.
  • Introduce/extend move-tables mode wiring across MigrationContext, CLI flags, inspector/applier, and migrator flow.
  • Add/adjust tests and benchmarks around the new builders and move-tables behaviors, plus update call sites for new function signatures.
Show a summary per file
File Description
go/sql/builder.go Adds move-tables copy SELECT/INSERT query builders.
go/sql/builder_test.go Adds tests/benchmarks for the new move-tables query builders.
go/logic/test_utils.go Adds helpers/constants for a second test database (move-tables).
go/logic/streamer.go Adds an Info log when registering listeners.
go/logic/migrator.go Adds/updates move-tables migration flow and mode-specific behavior gates.
go/logic/migrator_test.go Updates tests for new applier method signatures.
go/logic/inspect.go Uses a move-tables-aware “original table name” helper and improves error wrapping/logging.
go/logic/applier.go Implements move-tables target DB connections, query preparation, no-op changelog, and copy-loop code.
go/logic/applier_test.go Adds move-tables tests and extends suite to use a second database connection.
go/cmd/gh-ost/main.go Adds --move-tables CLI entrypoint and target-connection flags.
go/base/context.go Adds move-tables context fields and target table/database helpers, plus credential handling.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comments suppressed due to low confidence (9)

go/base/context.go:413

  • GetTargetDatabaseName() appears to have its branches inverted: in move-tables mode it currently returns the source DatabaseName, and outside move-tables it returns MoveTables.TargetDatabase (often empty). This breaks the new move-tables targeting and also contradicts the function’s own comment.
func (mctx *MigrationContext) GetTargetDatabaseName() string {
	if mctx.IsMoveTablesMode() {
		return mctx.DatabaseName
	}
	return mctx.MoveTables.TargetDatabase

go/base/context.go:969

  • ApplyCredentials() overwrites MoveTables.ConnectionConfig.User/Password with TargetUser/TargetPass even when those flags are empty, but the CLI help says they should default to the source credentials. As written, --move-tables with omitted --target-user/--target-password will attempt to connect with empty credentials.
	if mctx.IsMoveTablesMode() {
		// apply credentials for the applier from target CLI args
		if mctx.MoveTables.ConnectionConfig == nil {
			mctx.MoveTables.ConnectionConfig = &mysql.ConnectionConfig{}
		}

go/logic/applier.go:727

  • CreateTriggersOnGhost() always returns a non-nil error because it wraps err unconditionally (even when err is nil). This will make trigger creation appear to fail even on success.
// CreateTriggers creates the original triggers on applier host
func (apl *Applier) CreateTriggersOnGhost() error {
	err := apl.createTriggers(apl.migrationContext.GetGhostTableName())
	return fmt.Errorf("error creating triggers on ghost table: %v", err)
}

go/logic/applier.go:414

  • createTargetTable() builds the CREATE TABLE ... LIKE query using migrationContext.OriginalTableName, but move-tables mode intentionally uses MoveTables.TableNames[0] and may leave OriginalTableName empty. Using apl.originalTableName() here keeps the source table reference consistent across modes.
	query := fmt.Sprintf(`create /* gh-ost */ table %s.%s like %s.%s`,
		sql.EscapeName(targetDatabase),
		sql.EscapeName(targetTableName),
		sql.EscapeName(apl.migrationContext.DatabaseName),
		sql.EscapeName(apl.migrationContext.OriginalTableName),
	)

go/logic/applier.go:1169

  • ApplyIterationMoveTableCopyQueries() dereferences sourceDB (Query) without a nil check. Tests currently call it with nil, and any future caller error will panic instead of returning a normal error.
func (apl *Applier) ApplyIterationMoveTableCopyQueries(sourceDB *gosql.DB) (chunkSize int64, rowsAffected int64, duration time.Duration, err error) {
	startTime := time.Now()
	chunkSize = atomic.LoadInt64(&apl.migrationContext.ChunkSize)

	// First, select data from the source database:

go/logic/applier.go:1213

  • ApplyIterationMoveTableCopyQueries() always builds/executes an INSERT, even when the SELECT returns 0 rows. In that case MoveTableCopyInsertQueryBuilder.BuildQuery() produces a statement ending in values with no value list, which will fail at runtime.
	// Then, insert data into the destination database:
	sqlResult, err := func() (gosql.Result, error) {
		query, explodedArgs, err := apl.moveTablesCopyInsertQueryBuilder.BuildQuery(rows)
		if err != nil {
			return nil, err
		}

go/logic/migrator.go:530

  • In standard Migrate() mode this log message now says “target table” (and “target table migrated”), but this code path is waiting on mgtr.ghostTableMigrated and the rest of the flow still operates on ghost/changelog tables. This makes operator output misleading for normal gh-ost schema migrations.
	initialLag, _ := mgtr.inspector.getReplicationLag()
	if !mgtr.migrationContext.Resume {
		mgtr.migrationContext.Log.Infof("Waiting for target table to be migrated. Current lag is %+v", initialLag)
		<-mgtr.ghostTableMigrated
		mgtr.migrationContext.Log.Debugf("target table migrated")

go/logic/applier_test.go:1795

  • TestApplyIterationMoveTableCopyQueries() ignores the error from applier.prepareQueries(); if builder initialization fails the test will proceed and likely panic later with a nil query builder. This should assert NoError to fail fast with a helpful message.
	applier := NewApplier(migrationContext)
	applier.prepareQueries()
	defer applier.Teardown()

go/logic/migrator.go:917

  • The PR description says simulateMoveTablesCutover() demo hack was removed, but this function still exists and hardcodes the source primary port to 3307. Keeping this prototype-only logic in-tree (even if unused) makes it easier to accidentally invoke later and perform an unexpected RENAME on the source cluster.
	// manually hack the `mysql-source-primary` connection config based on test bed settings
	// this is just for demo purposes... I'm sorry.
	primaryConnectionConfig := mgtr.inspector.connectionConfig.Duplicate()
	primaryConnectionConfig.Key.Port = 3307

  • Files reviewed: 6/6 changed files
  • Comments generated: 7

Comment thread go/logic/migrator.go
Comment on lines 891 to 893
if err := mgtr.finalCleanup(); err != nil {
return nil
}
Comment thread go/logic/applier_test.go
Comment thread go/logic/applier_test.go Outdated
Comment thread go/logic/applier_test.go Outdated
Comment thread go/logic/applier_test.go Outdated
Comment thread go/logic/applier_test.go Outdated
Comment thread go/logic/migrator.go
Comment on lines 1817 to 1819
if err != nil {
return fmt.Errorf("ApplyIterationInsertQuery failed: %w", err) // wrapping call will retry
}
womoruyi added 2 commits June 5, 2026 21:48
- Fix TearDownTest assertion ordering: otherDB drop was missing NoError
  check after _ghc drop line was inserted
- Revert Migrate() log strings to 'ghost table' — the 'target table'
  wording was a regression affecting standard (non-move-tables) users
- Fix ApplyIterationMoveTableCopyQueries test to pass suite.db instead
  of nil to avoid potential panic on sourceDB.Query()
Explain why TestFinalCleanup, TestInitiateStreaming, and TestNoHeartbeat
tests verify the guard predicate rather than calling the full function,
and which other tests provide the behavioral proof.
- Remove unused simulateMoveTablesCutover() function (missed in demo hack cleanup)
- Fix errorlint: use %w instead of %v in fmt.Errorf (3 sites)
- Fix ineffassign: suppress unused err from CREATE DATABASE IF NOT EXISTS
- Fix rowserrcheck: check sqlRows.Err() after scan loop in ApplyIterationMoveTableCopyQueries
- Fix whitespace: remove trailing newline after ApplierConnectionConfig assignment
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot's findings

  • Files reviewed: 6/6 changed files
  • Comments generated: 8

Comment thread go/logic/migrator.go
Comment on lines 891 to 893
if err := mgtr.finalCleanup(); err != nil {
return nil
}
Comment thread go/logic/applier.go
Comment on lines 754 to +758
func (apl *Applier) WriteChangelog(hint, value string) (string, error) {
// TODO(chriskirkland): move this bypass higher
// In move-tables mode, there is no changelog table (§1.2). All changelog
// writes are no-ops. This is a single chokepoint rather than per-caller
// guards to prevent drift when new callers are added.
Comment thread go/logic/applier_test.go Outdated
Comment thread go/logic/applier_test.go
Comment on lines +398 to +402
suite.Require().False(applier.tableExists("_testing_gho"), "ghost table should not exist in move-tables mode")
suite.Require().False(applier.tableExists("_testing_ghc"), "changelog table should not exist in move-tables mode")

//Verify move-tables mode seeds columns from the source table
suite.Require().Equal(sql.NewColumnList([]string{"id", "item_id"}), migrationContext.OriginalTableColumnsOnApplier)
Comment thread go/logic/applier_test.go Outdated
Comment thread go/logic/applier_test.go
Comment on lines +482 to +487
// initiateStreaming() requires a binlog-capable MySQL connection to call directly.
// This test verifies the IsMoveTablesMode() predicate and that a new streamer starts with
// zero listeners. The actual skip is proven indirectly: if a changelog listener were registered,
// it would try to read events from a nonexistent _ghc table and fail during the full run.
func (suite *ApplierTestSuite) TestInitiateStreamingMoveTablesMode_NoChangelogListener() {
migrationContext := newTestMigrationContext()
Comment thread go/logic/applier_test.go
Comment on lines +500 to +504
// initiateApplier() requires a full migrator to call directly.
// This test verifies the IsMoveTablesMode() predicate that gates InitiateHeartbeat().
// Even if heartbeat ran, TestWriteChangelogNoOpInMoveTablesMode proves WriteChangelog
// is a no-op, so no SQL would execute against a nonexistent changelog table.
func (suite *ApplierTestSuite) TestNoHeartbeatInMoveTablesMode() {
Comment thread go/logic/applier.go Outdated
womoruyi added 2 commits June 5, 2026 22:15
- Remove contradictory TODO above WriteChangelog bypass (the comment
  now explains the bypass is intentional, so the TODO was misleading)
- Fix typo: #82066 → #8206 in test comment
- Update WriteChangelog doc comment to reflect empty-string return in
  move-tables mode
- Improve test comments: fix flawed reasoning in TestFinalCleanup,
  clarify TestStreaming tautology, add test-infrastructure note to
  TestHeartbeat
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants