Batch MPP claims into single ChannelMonitorUpdate by Abeeujah · Pull Request #4552 · lightningdevkit/rust-lightning

Abeeujah · 2026-04-11T17:42:15Z

Batch same-channel MPP claims into a single commitment

When an MMP payment has multiple parts that arrive over the same final-hop
channel, we previously processed each part sequentially through the full
commitment dance: claim one part, build a ChannelMonitorUpdate, ship a
commitment_signed, wait for the peer's RAA + next commitment_signed, then
move to the next part. This added significant claim latency and wasted
round trips on unnecessary monitor updates.

Instead, queue all MPP part claims into the channel's holding cell first,
then flush them together in a single pass. The holding-cell flush produces
one combined ChannelMonitorUpdate (preimage + commitment steps) and one
commitment_signed carrying all update_fulfill_htlc messages at once,
eliminating the intermediate round trips.

To enable this, the old get_update_fulfill_htlc_and_commit is split into
queue_fulfill_htlc (pushes into the holding cell) and the existing holding-
cell flush path. ClaimedMPPPayment RAA blockers are deliberately ignored
during both the queue and flush phases so the first MPP part on a channel
does not force subsequent parts onto the standalone preimage path.

When the channel cannot immediately flush its holding cell (e.g. peer is
disconnected or another monitor update is in flight), a standalone preimage-
only ChannelMonitorUpdate is persisted for HTLC-preimage safety, matching
the prior behavior.

Tests has been updated to reflect this new batching of MPP claims

test_single_channel_multiple_mpp
auto_retry_partial_failure
test_keysend_dup_hash_partial_mpp

closes #3986

ldk-reviews-bot · 2026-04-11T17:42:21Z

👋 Thanks for assigning @TheBlueMatt as a reviewer!
I'll wait for their review and will help manage the review process.
Once they submit their review, I'll check if a second reviewer would be helpful.

ldk-claude-review-bot · 2026-04-11T17:56:05Z

I've completed a thorough re-review of the entire PR diff. All significant issues from my prior review remain outstanding. I found no new critical issues beyond what was previously flagged.

Re-review of PR #4552 — Batch MPP claims into single ChannelMonitorUpdate

All previously flagged issues remain relevant and unaddressed. No new critical bugs found in this pass.

Prior issues still outstanding (not re-posted inline):

Preimage loss window (channelmanager.rs:10191): Between the claim loop and the flush loop, preimages exist only in the holding cell (not yet persisted to the ChannelMonitor). A force-close in that window loses preimages. Fund-loss risk.
Spurious MONITOR_UPDATE_IN_PROGRESS (channel.rs:7624): claim_htlc_while_disconnected_dropping_mon_update_legacy calls get_update_fulfill_htlc directly (correctly avoiding queue_fulfill_htlc's side effects, per the added comment). This is resolved.
N completion actions for 1 monitor update (channelmanager.rs:9929, 10049): Each HTLC registers its own PaymentClaimed action + ClaimedMPPPayment RAA blocker, but they all map to a single combined ChannelMonitorUpdate. Results in N redundant handle_monitor_update_release calls.
AllDuplicateClaims startup replay (channelmanager.rs:10067): Doesn't re-register RAA blockers or completion actions needed for startup replay.
Missing monitor_updating_paused in deferred-flush path (channelmanager.rs:10267): Potential panic with synchronous persistence when channel is awaiting RAA at claim time.
check_free_peer_holding_cells vs pending_events (channelmanager.rs:14046): Doesn't check pending_events for ReleaseRAAChannelMonitorUpdate (comment added explaining reasoning, but edge cases remain).
Fail/claim conflict weakened safety net (channel.rs:7801): Preimage no longer persisted in the structurally-prevented fail/claim race edge case.
Missing test coverage (channel.rs:7917): No test exercises the deferred-flush / standalone preimage path (when channel can't generate new commitment at claim time).

Minor new observations:

channelmanager.rs:9532 — errs vector is declared but never populated after this PR's refactoring (dead code). The drain loop at line 9670 is a no-op. Should be cleaned up.
chanmon_update_fail_tests.rs:4736 — The test_single_channel_multiple_mpp test verifies the batched message structure but does not drive the payment to completion (no commitment_signed dance, no forwarding to intermediate nodes, no end-to-end validation). The old test, despite its complexity, validated the full flow.

ldk-reviews-bot · 2026-04-13T17:53:33Z

🔔 1st Reminder

Hey @jkczyz! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

Abeeujah · 2026-04-16T15:13:54Z

let _ = htlc_value_msat

It's intentionally left unused right now, pending any design decision that would be made for the msat values that field holds, in the case of a single claim, there's a completion_action function computing with the value, In this case, it is a batch claim, so it's open ended to how batch claims should work

Do we sum all since they were claimed at a single swoop and pass to completion_action
Should a Batch Claim Event be implemented as there is for a single claim

This is why that field exist, the absense of the direction it should take is the reason it is intentionally currently left unused.

codecov · 2026-04-16T16:09:03Z

Codecov Report

❌ Patch coverage is 70.22901% with 78 lines in your changes missing coverage. Please review.
✅ Project coverage is 86.36%. Comparing base (44828f7) to head (fc83c75).
⚠️ Report is 17 commits behind head on main.

Files with missing lines	Patch %	Lines
lightning/src/ln/channelmanager.rs	68.34%	59 Missing and 4 partials ⚠️
lightning/src/ln/channel.rs	76.19%	15 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #4552      +/-   ##
==========================================
+ Coverage   86.12%   86.36%   +0.23%     
==========================================
  Files         157      158       +1     
  Lines      108922   109493     +571     
  Branches   108922   109493     +571     
==========================================
+ Hits        93812    94561     +749     
+ Misses      12495    12381     -114     
+ Partials     2615     2551      -64

Flag	Coverage Δ
fuzzing-fake-hashes	`5.06% <0.00%> (?)`
fuzzing-real-hashes	`22.89% <64.17%> (?)`
tests	`86.09% <70.22%> (-0.04%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

ldk-reviews-bot · 2026-04-16T22:09:27Z

🔔 1st Reminder

Hey @TheBlueMatt! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

ldk-reviews-bot · 2026-04-20T00:00:42Z

🔔 2nd Reminder

Hey @TheBlueMatt! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

ldk-reviews-bot · 2026-04-22T00:01:33Z

🔔 3rd Reminder

Hey @TheBlueMatt! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

TheBlueMatt

Hmm, would this not be simpler by using the holding cell instead? When we're failing, we call queue_fail_htlc which just shoves the failure into the holding cell then call check_free_peer_holding_cells to release them all at once. Rather than rebuilding that logic for claims, we should be able to do something similar. We'd have to handle the monitor updates with the preimages a bit differently, but not drastically so.

ldk-reviews-bot · 2026-05-18T00:00:07Z

🔔 1st Reminder

Hey @TheBlueMatt! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

TheBlueMatt · 2026-05-18T19:04:05Z

+		// When the caller asked us to push into the holding cell for batching
+		// (`force_holding_cell`) we skip producing a `ChannelMonitorUpdate` here; the caller is
+		// responsible for either flushing the holding cell (producing one combined update with
+		// every queued preimage and the commitment) or building a preimage-only update for


This is a massive layer violation - the ChannelManager shouldn't be making decisions around how to build ChannelMonitorUpdates, we shouldn't need a manual build_preimage_only_monitor_update, etc. Instead, we should be able to queue all the updates, then use the existing check_free_holding_cells logic (or check_free_peer_holding_cells to filter by the channels we actually updated) and have the channel automatically include the preimage updatesteps in the ChannelMonitorUpdate it builds.

ldk-claude-review-bot · 2026-05-29T20:49:56Z

+		let mon_update_id = self.context.latest_monitor_update_id;
+
+		self.queue_fulfill_htlc(htlc_id_arg, payment_preimage_arg, None, None, false, logger);
 		self.context.latest_monitor_update_id = mon_update_id;


Bug: Spurious MONITOR_UPDATE_IN_PROGRESS and corrupted blocked update IDs

queue_fulfill_htlc now has side effects that the old get_update_fulfill_htlc did not:

Since can_generate_new_commitment() is false (asserted above), will_flush_immediately = false inside queue_fulfill_htlc.

The !will_flush_immediately branch calls monitor_updating_paused(...) at line 7690, which sets MONITOR_UPDATE_IN_PROGRESS on the channel state.

If blocked_monitor_updates is non-empty, lines 7678-7684 shift all blocked update IDs up by 1.

The caller resets latest_monitor_update_id at line 7624, but does NOT:

Clear MONITOR_UPDATE_IN_PROGRESS (no monitor update was actually submitted, so nothing will call monitor_updating_restored to clear it)

Un-shift the blocked_monitor_updates IDs

The old code called get_update_fulfill_htlc directly, which did NOT call monitor_updating_paused and did NOT shift blocked update IDs when taking the holding-cell path. This is a regression.

Consequences:

MONITOR_UPDATE_IN_PROGRESS is set with no corresponding in-flight monitor update. After peer reconnection, can_generate_new_commitment() returns false (due to MONITOR_UPDATE_IN_PROGRESS), potentially preventing the channel from making progress.

If blocked_monitor_updates existed, their IDs are now off-by-one from what the ChannelMonitor expects, which could cause update ID mismatches.

In practice, this may be partially mitigated if a MonitorUpdatesComplete background event fires for this channel during startup (clearing the flag via channel_monitor_updated → monitor_updating_restored), but that's fragile and shouldn't be relied upon.

Fix: Avoid queue_fulfill_htlc here. Either call get_update_fulfill_htlc directly (as the old code did), or add explicit cleanup:

self.queue_fulfill_htlc(htlc_id_arg, payment_preimage_arg, None, None, false, logger); self.context.latest_monitor_update_id = mon_update_id; // queue_fulfill_htlc may have set MONITOR_UPDATE_IN_PROGRESS and shifted // blocked_monitor_updates — undo both since we're discarding the update. self.context.channel_state.clear_monitor_update_in_progress();

(Plus un-shifting blocked updates if applicable, which is more complex — calling get_update_fulfill_htlc directly is simpler.)

ldk-claude-review-bot · 2026-05-29T21:23:17Z

+					None
+				}
+			})
+			.collect();


Bug (minor): check_free_peer_holding_cells does not check pending_events for ReleaseRAAChannelMonitorUpdate events when building the set of RAA-blocked channels. Compare with the raa_blocked computation in claim_mpp_part (line 9903-9920) and raa_monitor_updates_held (line 13068-13085), which both check pending_events in addition to actions_blocking_raa_monitor_updates.

If a ReleaseRAAChannelMonitorUpdate event is pending but no actions_blocking_raa_monitor_updates entry exists for the channel, check_free_peer_holding_cells will attempt to flush the holding cell. The flush's combined ChannelMonitorUpdate will then be blocked by raa_monitor_updates_held inside handle_new_monitor_update (because it DOES check pending_events). The end result is correct (preimage is in the blocked update), but the discrepancy is confusing and could mask issues.

More importantly, the comment on line 14012-14015 says "The preimage half of any queued claim has already been persisted as a standalone monitor update by queue_fulfill_htlc", but this is only true when raa_blocked was true in queue_fulfill_htlc. If queue_fulfill_htlc's raa_blocked check at line 9892-9920 saw no blocker (perhaps the ReleaseRAAChannelMonitorUpdate event was added between the queue and flush), the preimage was deferred to the flush. Then the flush produces a combined update that gets blocked by raa_monitor_updates_held — meaning the preimage sits in a blocked_monitor_updates queue in memory, not yet durably persisted to the ChannelMonitor. A crash before the blocker clears would lose the preimage.

Consider also checking pending_events for ReleaseRAAChannelMonitorUpdate in this function, consistent with the other two call sites.

The discrepancy is intentional rather than buggy, this call site's purpose is just to avoid wasting work generating a commitment_signed that the peer will reject (or that we can't usefully proceed with). It's not part of the preimage-safety invariant.

Adding the pending_events check here would incorrectly supress legitimate flushes, since the presence of a ReleaseRAAChannelMonitorUpdate event only blocks future inbound-RAA-derived monitor updates, not outbound commitment generation.

ldk-claude-review-bot · 2026-05-29T22:13:36Z

-								htlc_value_msat,
-								update_blocked: true,
-							};
+							return UpdateFulfillFetch::DuplicateClaim {};


Bug (low severity): Behavioral change weakens preimage-safety net in fulfill/fail race

The old code for this branch returned UpdateFulfillFetch::NewClaim { monitor_update, htlc_value_msat, update_blocked: true } — the preimage-containing ChannelMonitorUpdate was built before the holding-cell check and persisted to the monitor even though the HTLC was also being failed. The new code returns DuplicateClaim, so the preimage is never persisted to the ChannelMonitor.

If in release mode (where the debug_assert doesn't fire) the channel is force-closed and the counterparty broadcasts a commitment that includes this HTLC, the monitor won't have the preimage to claim on-chain.

In practice this should be unreachable because claim_funds and the fail path don't race (both are gated by claimable_payments), so the debug_assert catches it in testing. But the safety net is now weaker.

Consider persisting the preimage even in this edge case, or adding a comment explicitly documenting that this is safe because the race is structurally prevented.

When an MMP payment has multiple parts that arrive over the same final-hop channel, we previously processed each part sequentially through the full commitment dance: claim one part, build a ChannelMonitorUpdate, ship a commitment_signed, wait for the peer's RAA + next commitment_signed, then move to the next part. This added significant claim latency and wasted round trips on unnecessary monitor updates. Instead, queue all MPP part claims into the channel's holding cell first, then flush them together in a single pass. The holding-cell flush produces one combined ChannelMonitorUpdate (preimage + commitment steps) and one commitment_signed carrying all update_fulfill_htlc messages at once, eliminating the intermediate round trips. To enable this, the old get_update_fulfill_htlc_and_commit is split into queue_fulfill_htlc (pushes into the holding cell) and the existing holding- cell flush path. ClaimedMPPPayment RAA blockers are deliberately ignored during both the queue and flush phases so the first MPP part on a channel does not force subsequent parts onto the standalone preimage path. When the channel cannot immediately flush its holding cell (e.g. peer is disconnected or another monitor update is in flight), a standalone preimage- only ChannelMonitorUpdate is persisted for HTLC-preimage safety, matching the prior behavior.

ldk-reviews-bot requested a review from jkczyz April 11, 2026 17:52

ldk-claude-review-bot reviewed Apr 11, 2026

View reviewed changes

Comment thread lightning/src/ln/channelmanager.rs Outdated

ldk-claude-review-bot reviewed Apr 11, 2026

View reviewed changes

Comment thread lightning/src/ln/channelmanager.rs Outdated

ldk-claude-review-bot reviewed Apr 11, 2026

View reviewed changes

Comment thread lightning/src/ln/channel.rs Outdated

ldk-claude-review-bot reviewed Apr 11, 2026

View reviewed changes

Comment thread lightning/src/ln/channelmanager.rs Outdated

ldk-claude-review-bot reviewed Apr 11, 2026

View reviewed changes

Comment thread lightning/src/ln/channel.rs Outdated

ldk-claude-review-bot reviewed Apr 11, 2026

View reviewed changes

Comment thread lightning/src/ln/chanmon_update_fail_tests.rs Outdated

Abeeujah force-pushed the parallelize-mppclaims branch from 395b30d to 88d62c4 Compare April 12, 2026 09:17

Abeeujah requested a review from ldk-claude-review-bot April 12, 2026 09:24

ldk-claude-review-bot reviewed Apr 12, 2026

View reviewed changes

Comment thread lightning/src/ln/channelmanager.rs Outdated

ldk-claude-review-bot reviewed Apr 12, 2026

View reviewed changes

Comment thread lightning/src/ln/channel.rs Outdated

ldk-claude-review-bot reviewed Apr 12, 2026

View reviewed changes

Comment thread lightning/src/ln/channelmanager.rs Outdated

Abeeujah force-pushed the parallelize-mppclaims branch from 88d62c4 to 5880633 Compare April 12, 2026 11:08

jkczyz requested review from TheBlueMatt and removed request for jkczyz April 14, 2026 22:09

Abeeujah force-pushed the parallelize-mppclaims branch from 5880633 to d801ec9 Compare April 16, 2026 15:08

TheBlueMatt reviewed Apr 24, 2026

View reviewed changes

Abeeujah force-pushed the parallelize-mppclaims branch from d801ec9 to 5992824 Compare April 27, 2026 14:00

ldk-claude-review-bot reviewed Apr 27, 2026

View reviewed changes

Comment thread lightning/src/ln/channelmanager.rs Outdated

Abeeujah force-pushed the parallelize-mppclaims branch from 5992824 to f466457 Compare April 27, 2026 15:44

ldk-claude-review-bot reviewed Apr 27, 2026

View reviewed changes

Comment thread lightning/src/ln/channelmanager.rs Outdated

ldk-claude-review-bot reviewed Apr 27, 2026

View reviewed changes

Comment thread lightning/src/ln/channelmanager.rs Outdated

ldk-claude-review-bot reviewed Apr 27, 2026

View reviewed changes

Comment thread lightning/src/ln/channelmanager.rs Outdated

ldk-claude-review-bot reviewed Apr 27, 2026

View reviewed changes

Comment thread lightning/src/ln/channel.rs Outdated

TheBlueMatt reviewed May 1, 2026

View reviewed changes

Comment thread lightning/src/ln/channel.rs Outdated

Comment thread lightning/src/ln/channelmanager.rs Outdated

Abeeujah force-pushed the parallelize-mppclaims branch from f466457 to dddd924 Compare May 13, 2026 18:26

ldk-claude-review-bot reviewed May 13, 2026

View reviewed changes

Comment thread lightning/src/ln/chanmon_update_fail_tests.rs Outdated

ldk-claude-review-bot reviewed May 13, 2026

View reviewed changes

Comment thread lightning/src/ln/channelmanager.rs Outdated

ldk-claude-review-bot reviewed May 13, 2026

View reviewed changes

Comment thread lightning/src/ln/channel.rs Outdated

Abeeujah force-pushed the parallelize-mppclaims branch 2 times, most recently from 086a669 to 2a28f1d Compare May 14, 2026 12:39

ldk-claude-review-bot reviewed May 14, 2026

View reviewed changes

Comment thread lightning/src/ln/channelmanager.rs Outdated

ldk-claude-review-bot reviewed May 14, 2026

View reviewed changes

Comment thread lightning/src/ln/channelmanager.rs Outdated

ldk-claude-review-bot reviewed May 14, 2026

View reviewed changes

Comment thread lightning/src/ln/channelmanager.rs Outdated

Abeeujah force-pushed the parallelize-mppclaims branch from 2a28f1d to fc83c75 Compare May 14, 2026 16:24

Abeeujah requested a review from TheBlueMatt May 15, 2026 11:33

TheBlueMatt reviewed May 18, 2026

View reviewed changes

Abeeujah force-pushed the parallelize-mppclaims branch from fc83c75 to 4e596b0 Compare May 29, 2026 20:38

ldk-claude-review-bot reviewed May 29, 2026

View reviewed changes

Abeeujah force-pushed the parallelize-mppclaims branch from 4e596b0 to 6abcee3 Compare May 29, 2026 21:15

ldk-claude-review-bot reviewed May 29, 2026

View reviewed changes

Abeeujah force-pushed the parallelize-mppclaims branch from 6abcee3 to 92fd2d9 Compare May 29, 2026 22:02

ldk-claude-review-bot reviewed May 29, 2026

View reviewed changes

Abeeujah force-pushed the parallelize-mppclaims branch from 92fd2d9 to 6f6a028 Compare May 29, 2026 23:09

Conversation

Abeeujah commented Apr 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ldk-reviews-bot commented Apr 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ldk-claude-review-bot commented Apr 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Re-review of PR #4552 — Batch MPP claims into single ChannelMonitorUpdate

Prior issues still outstanding (not re-posted inline):

Minor new observations:

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ldk-reviews-bot commented Apr 13, 2026

Uh oh!

Abeeujah commented Apr 16, 2026

Uh oh!

codecov Bot commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ldk-reviews-bot commented Apr 16, 2026

Uh oh!

ldk-reviews-bot commented Apr 20, 2026

Uh oh!

ldk-reviews-bot commented Apr 22, 2026

Uh oh!

TheBlueMatt left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ldk-reviews-bot commented May 18, 2026

Uh oh!

TheBlueMatt May 18, 2026

Choose a reason for hiding this comment

Uh oh!

ldk-claude-review-bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

ldk-claude-review-bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

Abeeujah May 29, 2026

Choose a reason for hiding this comment

Uh oh!

ldk-claude-review-bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Abeeujah commented Apr 11, 2026 •

edited

Loading

ldk-reviews-bot commented Apr 11, 2026 •

edited

Loading

ldk-claude-review-bot commented Apr 11, 2026 •

edited

Loading

codecov Bot commented Apr 16, 2026 •

edited

Loading