Skip to content

fix(copy): apply --connection-mode to CopyOnMaster path#33

Open
talmacschen-arch wants to merge 4 commits into
cloudberry-contrib:mainfrom
talmacschen-arch:fix/connection-mode-pull-onmaster
Open

fix(copy): apply --connection-mode to CopyOnMaster path#33
talmacschen-arch wants to merge 4 commits into
cloudberry-contrib:mainfrom
talmacschen-arch:fix/connection-mode-pull-onmaster

Conversation

@talmacschen-arch
Copy link
Copy Markdown
Collaborator

fix #32


Change logs

--connection-mode pull was advertised as "destination connects to source",
but the CopyOnMaster strategy ignored the flag and always made the source
master dial the destination master. Two table classes are forced into that
strategy — DISTRIBUTED REPLICATED tables (hardcoded) and tables with
rows <= --on-segment-threshold (default 1,000,000) — so in any topology
where the destination can only be reached as a dialer (e.g. destination
cluster running inside Kubernetes, source outside), cbcopy hits one of
those tables and dies with a TCP failure or, in the special case of
pull + small/replicated tables, with relation "public.cbcopy_ports_temp_onmaster_<ts>" does not exist.

This PR makes --connection-mode apply uniformly to every copy strategy.

What changes

  • copy/copy_command.go — add the if ConnectionMode == ConnectionModePull branch to CopyOnMaster.CopyTo / CopyFrom (symmetric to the existing CopyOnSegment branches), and have FormMasterHelperAddress pick --source-host vs --dest-host by mode. The SQL-building bodies are split into unexported formatCopyToCommand / formatCopyFromCommand so they are unit-testable without a DB.
  • copy/copy_operation.go — drop the || op.command.IsMasterCopy() clause from the three branch points in CopyOperation.Execute. After the change above, the listener side under pull + CopyOnMaster is src just like every other pull strategy, so the master path no longer needs its own override. The IsMasterCopy() call at line 89 (which decides whether to query the _onmaster_ or _onall_ port temp table) is intentionally retained — it is orthogonal to connection mode.
  • copy/copy.go and copy/copy_port_helper.go — unchanged. The existing initializeClusterResources already picks srcManageConn as the listener under pull, which is now correct for both _onmaster_ and _onall_ external tables; the previously-needed "fix the port temp table location" turns out to fall out for free.

Behavior matrix after this PR

connection mode strategy listener dialer port temp table on net path required
push CopyOnMaster dest master src master dest src → dest:5432 + helper ports
push CopyOnSegment / ExtDest* dest segs src segs dest same
pull CopyOnMaster src master dest master src dest → src:5432 + helper ports
pull CopyOnSegment / ExtDest* src segs dest segs src same

The previously-broken cell is the bolded one.

User-visible doc note

Under --connection-mode=pull, --source-host must be an address reachable from destination cluster nodes (a 127.0.0.1 literal works only when the destination sees itself as the same host). This is the symmetric constraint to push mode's --dest-host, and is now spelled out in the flag's --help text and in the README's "Hostname Resolution" subsection.

Tests

  • Unit (copy/copy_command_test.go, new) — table-driven, asserts the SQL emitted by CopyOnMaster in all four (push/pull) × (CopyTo/CopyFrom) quadrants. Each case checks both required tokens (--listen, --host, --port, --data-port-range, --direction send/receive, --cmd-id) and tokens that must be absent on the corresponding branch, so a regression where the two branches build identical strings would fail loudly.
  • E2E (end_to_end/basic_test.go, new It block) — runs cbcopy --connection-mode pull with deliberately distinct literal values for --source-host (127.0.0.1) and --dest-host (localhost) that both resolve to the coordinator on the test cluster, forces TEST_COPY_STRATEGY=CopyOnMaster, and grep-asserts the debug log: the sending side must contain --listen and not --host, the receiving side must contain --host 127.0.0.1 (i.e. the source value) and not --listen. The pre-fix code would emit the mirror image and fail at all four token-level assertions. Note: the existing {CopyOnMaster, pull} entry in the table-driven loop did not catch this regression on a single-cluster test bed because --source-host and --dest-host come from the same env-derived value there.
  • Manual — verified on a real multi-host SynxDB4 / Cloudberry 2.1.0 cluster (4 hosts) running make end_to_end.

Contributor's checklist

CopyOnMaster.CopyTo/CopyFrom previously hardcoded the data-connection
direction to src master -> dest master regardless of --connection-mode,
so users selecting `pull` for dest-only-reachable topologies (e.g. dest
GP in K8S, src outside) still saw src trying to dial dest for any
replicated table or any table with rows <= --on-segment-threshold.

This adds symmetric push/pull branches to CopyOnMaster, makes
FormMasterHelperAddress pick --source-host vs --dest-host by mode, and
removes the now-incorrect `|| op.command.IsMasterCopy()` special case
from CopyOperation.Execute (the listener side under pull+master is now
src, just like other pull strategies, so the master path no longer
needs its own override).

The helper-port temp table mismatch in pull+CopyOnMaster (built on src,
queried from dest) goes away as a consequence -- under the new code the
listener is on src for both _onmaster_ and _onall_ in pull mode, so the
existing single-listenConn setup in initializeClusterResources is
already correct without change.

Closes cloudberry-contrib#32
Adds table-driven unit tests for the four (push/pull) x (CopyTo/CopyFrom)
quadrants of CopyOnMaster, plus a focused test for FormMasterHelperAddress
verifying it picks --source-host vs --dest-host by connection mode.

Tests assert both required tokens (--listen, --host, --port,
--data-port-range, --direction send/receive, --cmd-id) and tokens that
must be absent on each branch, guarding against a regression where both
branches accidentally build the same string.

To make the assertions possible without a live DB connection, the SQL
formatting in CopyOnMaster.CopyTo/CopyFrom is split out into
unexported formatCopyToCommand/formatCopyFromCommand methods. CopyTo
and CopyFrom now just call those and execute the result -- behavior is
unchanged.

The existing end_to_end/basic_test.go {CopyOnMaster, pull} case did not
catch the original bug because its src and dest are two databases on
the same master process sharing /tmp; these unit tests do not require
that setup and cover the divergence between push and pull branches at
the SQL-string level.
The existing {CopyOnMaster, pull} entry in the table-driven loop did not
catch issue cloudberry-contrib#32 in a single-cluster test bed: both --source-host and
--dest-host get filled from the same env-derived value, so a regression
where pull silently routes through the push branch produces identical
output. This adds a focused It block that:

  * passes literally distinct values for --source-host (127.0.0.1) and
    --dest-host (localhost) that still resolve to the same coordinator,
  * runs cbcopy with --debug so the per-table "COPY command of ..." lines
    are visible, and
  * asserts on those lines that the pull branch was actually taken:
    sending side has --listen and no --host; receiving side has
    --host 127.0.0.1 (the --source-host value) and no --listen.

The pre-fix code would emit the mirror image and four token-level
assertions would each fail on a different argument, making the
regression mode obvious.

A tiny helper findLineContaining is added at file scope; the test
package already imports strings.
Under --connection-mode push (the default) the source cluster dials the
destination, so --dest-host has to be reachable from source nodes -- a
constraint the README's Hostname Resolution note already covers. Under
--connection-mode pull, the direction reverses and --source-host has to
be reachable from destination nodes; that direction was not documented
anywhere, which made the K8s-style "destination is the only side that
can dial" use case easy to misconfigure (e.g. passing
--source-host 127.0.0.1 when cbcopy runs on the source master).

This patch:

  * adds the reachability sentence to each flag's --help description in
    copy/copy.go and mirrors it into the verbatim help dump in
    README.md, and
  * appends a symmetric paragraph in the README's Hostname Resolution
    subsection so pull-mode users see the constraint where they would
    look for push-mode users.

No code behavior change.
@talmacschen-arch
Copy link
Copy Markdown
Collaborator Author

Hi @gfphoenix78, sorry to bother you — thanks again for the quick turnaround on #35!

Whenever you have a spare moment, would you mind taking a look at this one as well? It's a small fix scoped to making CopyOnMaster honor --connection-mode pull (previously it always went through push regardless of the flag). I've verified it on a 4-host SynxDB4 cluster end-to-end and added unit + e2e coverage. No rush at all — just flagging it in case it slipped off the radar.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] --connection-mode pull is not applied to the CopyOnMaster path (replicated / small tables fail in dest-only-reachable topologies)

1 participant