fix(copy): apply --connection-mode to CopyOnMaster path#33
Open
talmacschen-arch wants to merge 4 commits into
Open
fix(copy): apply --connection-mode to CopyOnMaster path#33talmacschen-arch wants to merge 4 commits into
talmacschen-arch wants to merge 4 commits into
Conversation
CopyOnMaster.CopyTo/CopyFrom previously hardcoded the data-connection direction to src master -> dest master regardless of --connection-mode, so users selecting `pull` for dest-only-reachable topologies (e.g. dest GP in K8S, src outside) still saw src trying to dial dest for any replicated table or any table with rows <= --on-segment-threshold. This adds symmetric push/pull branches to CopyOnMaster, makes FormMasterHelperAddress pick --source-host vs --dest-host by mode, and removes the now-incorrect `|| op.command.IsMasterCopy()` special case from CopyOperation.Execute (the listener side under pull+master is now src, just like other pull strategies, so the master path no longer needs its own override). The helper-port temp table mismatch in pull+CopyOnMaster (built on src, queried from dest) goes away as a consequence -- under the new code the listener is on src for both _onmaster_ and _onall_ in pull mode, so the existing single-listenConn setup in initializeClusterResources is already correct without change. Closes cloudberry-contrib#32
Adds table-driven unit tests for the four (push/pull) x (CopyTo/CopyFrom)
quadrants of CopyOnMaster, plus a focused test for FormMasterHelperAddress
verifying it picks --source-host vs --dest-host by connection mode.
Tests assert both required tokens (--listen, --host, --port,
--data-port-range, --direction send/receive, --cmd-id) and tokens that
must be absent on each branch, guarding against a regression where both
branches accidentally build the same string.
To make the assertions possible without a live DB connection, the SQL
formatting in CopyOnMaster.CopyTo/CopyFrom is split out into
unexported formatCopyToCommand/formatCopyFromCommand methods. CopyTo
and CopyFrom now just call those and execute the result -- behavior is
unchanged.
The existing end_to_end/basic_test.go {CopyOnMaster, pull} case did not
catch the original bug because its src and dest are two databases on
the same master process sharing /tmp; these unit tests do not require
that setup and cover the divergence between push and pull branches at
the SQL-string level.
The existing {CopyOnMaster, pull} entry in the table-driven loop did not
catch issue cloudberry-contrib#32 in a single-cluster test bed: both --source-host and
--dest-host get filled from the same env-derived value, so a regression
where pull silently routes through the push branch produces identical
output. This adds a focused It block that:
* passes literally distinct values for --source-host (127.0.0.1) and
--dest-host (localhost) that still resolve to the same coordinator,
* runs cbcopy with --debug so the per-table "COPY command of ..." lines
are visible, and
* asserts on those lines that the pull branch was actually taken:
sending side has --listen and no --host; receiving side has
--host 127.0.0.1 (the --source-host value) and no --listen.
The pre-fix code would emit the mirror image and four token-level
assertions would each fail on a different argument, making the
regression mode obvious.
A tiny helper findLineContaining is added at file scope; the test
package already imports strings.
Under --connection-mode push (the default) the source cluster dials the
destination, so --dest-host has to be reachable from source nodes -- a
constraint the README's Hostname Resolution note already covers. Under
--connection-mode pull, the direction reverses and --source-host has to
be reachable from destination nodes; that direction was not documented
anywhere, which made the K8s-style "destination is the only side that
can dial" use case easy to misconfigure (e.g. passing
--source-host 127.0.0.1 when cbcopy runs on the source master).
This patch:
* adds the reachability sentence to each flag's --help description in
copy/copy.go and mirrors it into the verbatim help dump in
README.md, and
* appends a symmetric paragraph in the README's Hostname Resolution
subsection so pull-mode users see the constraint where they would
look for push-mode users.
No code behavior change.
2 tasks
Collaborator
Author
|
Hi @gfphoenix78, sorry to bother you — thanks again for the quick turnaround on #35! Whenever you have a spare moment, would you mind taking a look at this one as well? It's a small fix scoped to making |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
fix #32
Change logs
--connection-mode pullwas advertised as "destination connects to source",but the
CopyOnMasterstrategy ignored the flag and always made the sourcemaster dial the destination master. Two table classes are forced into that
strategy —
DISTRIBUTED REPLICATEDtables (hardcoded) and tables withrows <= --on-segment-threshold(default 1,000,000) — so in any topologywhere the destination can only be reached as a dialer (e.g. destination
cluster running inside Kubernetes, source outside), cbcopy hits one of
those tables and dies with a TCP failure or, in the special case of
pull + small/replicated tables, with
relation "public.cbcopy_ports_temp_onmaster_<ts>" does not exist.This PR makes
--connection-modeapply uniformly to every copy strategy.What changes
copy/copy_command.go— add theif ConnectionMode == ConnectionModePullbranch toCopyOnMaster.CopyTo/CopyFrom(symmetric to the existingCopyOnSegmentbranches), and haveFormMasterHelperAddresspick--source-hostvs--dest-hostby mode. The SQL-building bodies are split into unexportedformatCopyToCommand/formatCopyFromCommandso they are unit-testable without a DB.copy/copy_operation.go— drop the|| op.command.IsMasterCopy()clause from the three branch points inCopyOperation.Execute. After the change above, the listener side underpull + CopyOnMasterissrcjust like every other pull strategy, so the master path no longer needs its own override. TheIsMasterCopy()call at line 89 (which decides whether to query the_onmaster_or_onall_port temp table) is intentionally retained — it is orthogonal to connection mode.copy/copy.goandcopy/copy_port_helper.go— unchanged. The existinginitializeClusterResourcesalready pickssrcManageConnas the listener under pull, which is now correct for both_onmaster_and_onall_external tables; the previously-needed "fix the port temp table location" turns out to fall out for free.Behavior matrix after this PR
The previously-broken cell is the bolded one.
User-visible doc note
Under
--connection-mode=pull,--source-hostmust be an address reachable from destination cluster nodes (a127.0.0.1literal works only when the destination sees itself as the same host). This is the symmetric constraint to push mode's--dest-host, and is now spelled out in the flag's--helptext and in the README's "Hostname Resolution" subsection.Tests
copy/copy_command_test.go, new) — table-driven, asserts the SQL emitted byCopyOnMasterin all four(push/pull) × (CopyTo/CopyFrom)quadrants. Each case checks both required tokens (--listen,--host,--port,--data-port-range,--direction send/receive,--cmd-id) and tokens that must be absent on the corresponding branch, so a regression where the two branches build identical strings would fail loudly.end_to_end/basic_test.go, newItblock) — runscbcopy --connection-mode pullwith deliberately distinct literal values for--source-host(127.0.0.1) and--dest-host(localhost) that both resolve to the coordinator on the test cluster, forcesTEST_COPY_STRATEGY=CopyOnMaster, and grep-asserts the debug log: the sending side must contain--listenand not--host, the receiving side must contain--host 127.0.0.1(i.e. the source value) and not--listen. The pre-fix code would emit the mirror image and fail at all four token-level assertions. Note: the existing{CopyOnMaster, pull}entry in the table-driven loop did not catch this regression on a single-cluster test bed because--source-hostand--dest-hostcome from the same env-derived value there.make end_to_end.Contributor's checklist
git log)