Skip to content

Async background persistence #3905

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

joostjager
Copy link
Contributor

@joostjager joostjager commented Jul 2, 2025

Stripped down version of #3778. It allows background persistence to be async, but channel monitor persistence remains sync. This means that for the time being, users wanting async background persistence would be required to implement both the sync and the async KVStore trait. This model is available through process_events_full_async.

process_events_async still takes a synchronous kv store to remain backwards compatible.

Usage in ldk-node: lightningdevkit/ldk-node@main...joostjager:ldk-node:upgrade-to-async-kvstore

@ldk-reviews-bot
Copy link

ldk-reviews-bot commented Jul 2, 2025

👋 Thanks for assigning @TheBlueMatt as a reviewer!
I'll wait for their review and will help manage the review process.
Once they submit their review, I'll check if a second reviewer would be helpful.

@joostjager joostjager force-pushed the async-persister branch 7 times, most recently from 3fb7d6b to 1847e8d Compare July 3, 2025 09:57
@joostjager joostjager self-assigned this Jul 3, 2025
@joostjager joostjager force-pushed the async-persister branch 2 times, most recently from 1f59bbe to 723a5a6 Compare July 3, 2025 11:52
@joostjager joostjager mentioned this pull request May 12, 2025
24 tasks
@TheBlueMatt TheBlueMatt linked an issue Jul 7, 2025 that may be closed by this pull request
@joostjager joostjager force-pushed the async-persister branch 10 times, most recently from bc9c29a to 90ab1ba Compare July 9, 2025 09:52
@joostjager joostjager marked this pull request as ready for review July 9, 2025 09:52
@joostjager joostjager requested a review from tnull July 9, 2025 09:52
Comment on lines 631 to 693
fn persist_state<'a>(
&self, sweeper_state: &SweeperState,
) -> Pin<Box<dyn Future<Output = Result<(), io::Error>> + 'a + Send>> {
let encoded = &sweeper_state.encode();

self.kv_store.write(
OUTPUT_SWEEPER_PERSISTENCE_PRIMARY_NAMESPACE,
OUTPUT_SWEEPER_PERSISTENCE_SECONDARY_NAMESPACE,
OUTPUT_SWEEPER_PERSISTENCE_KEY,
encoded,
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The encoded variable is captured by reference in the returned future, but it's a local variable that will be dropped when the function returns. This creates a potential use-after-free issue. Consider moving ownership of encoded into the future instead:

fn persist_state<'a>(
    &self, sweeper_state: &SweeperState,
) -> Pin<Box<dyn Future<Output = Result<(), io::Error>> + 'a + Send>> {
    let encoded = sweeper_state.encode();

    self.kv_store.write(
        OUTPUT_SWEEPER_PERSISTENCE_PRIMARY_NAMESPACE,
        OUTPUT_SWEEPER_PERSISTENCE_SECONDARY_NAMESPACE,
        OUTPUT_SWEEPER_PERSISTENCE_KEY,
        &encoded,
    )
}

This ensures the data remains valid for the lifetime of the future.

Suggested change
fn persist_state<'a>(
&self, sweeper_state: &SweeperState,
) -> Pin<Box<dyn Future<Output = Result<(), io::Error>> + 'a + Send>> {
let encoded = &sweeper_state.encode();
self.kv_store.write(
OUTPUT_SWEEPER_PERSISTENCE_PRIMARY_NAMESPACE,
OUTPUT_SWEEPER_PERSISTENCE_SECONDARY_NAMESPACE,
OUTPUT_SWEEPER_PERSISTENCE_KEY,
encoded,
)
fn persist_state<'a>(
&self, sweeper_state: &SweeperState,
) -> Pin<Box<dyn Future<Output = Result<(), io::Error>> + 'a + Send>> {
let encoded = sweeper_state.encode();
self.kv_store.write(
OUTPUT_SWEEPER_PERSISTENCE_PRIMARY_NAMESPACE,
OUTPUT_SWEEPER_PERSISTENCE_SECONDARY_NAMESPACE,
OUTPUT_SWEEPER_PERSISTENCE_KEY,
&encoded,
)

Spotted by Diamond

Is this helpful? React 👍 or 👎 to let us know.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this real?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so as the compiler would likely optimize that away, given that encoded will be an owned value (Vec returned by encode()). Still, the change that it suggests looks cleaner.

In general it will be super confusing that we encode at the time of creating the future, but would only actually persist once we dropped the lock. Starting from now we'll need to be super cautious about the side-effects of interleaving persist calls.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is that an async kv store store encodes the data and stores the write action in a queue at the moment the future is created. Things should still happen in the original order.

Can you show a specific scenario where we have to be super cautious even if we have that queue?

Copy link
Contributor Author

@joostjager joostjager Jul 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved &

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is that an async kv store store encodes the data and stores the write action in a queue at the moment the future is created. Things should still happen in the original order.

If that is the idea that we start assuming in this PR, we should probably also start documenting these assumptions in this PR on KVStore already.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this requirement to the async KVStore trait doc

@joostjager
Copy link
Contributor Author

@tnull changes made diff and replied to open threads.

@joostjager joostjager requested a review from tnull July 14, 2025 09:03
Copy link
Contributor

@tnull tnull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned above, changes basically look good to me, although I'd prefer to avoid process_events_full_async by using the builder introduced in #3688.

But, generally this should be good for a second reviewer, so pinging @TheBlueMatt.

@tnull tnull requested a review from TheBlueMatt July 15, 2025 09:16
/// Trait that handles persisting a [`ChannelManager`], [`NetworkGraph`], and [`WriteableScore`] to disk.
///
/// [`ChannelManager`]: crate::ln::channelmanager::ChannelManager
pub trait Persister<'a, CM: Deref, L: Deref, S: Deref>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given Persister is only used in lightning-background-processor and we've been migrating to just using KVStore everywhere (eg its now required to use Sweeper), maybe we just kill off Persister entirely? We had Persister before we had KVStore as a way to persist the objects that the BP wanted to persist. To simplify the interface, we added the KVStore as a pseudo-wrapper around Persister. But since Sweeper now requires a KVStore explicitly, users can no longer only implement Persister, making it basically useless.

The only reason to keep it would be to avoid building the encoded Vec<u8> of the network graph and scorer for users who are avoiding persisting those objects, but I'm not entirely sure avoiding the ~50MiB memory spike during write is worth it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added first commit that removes Persister. Rebased the rest. Lots of simplication.

@@ -922,6 +930,173 @@ where
}
}

/// A wrapper around [`OutputSweeper`] to be used with a sync kv store.
pub struct OutputSweeperSyncKVStore<
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also not clear on why this needs a separate wrapper. We can have a second constructor on OutputSweeper that does the KVStore sync wrapping before returning a fully-async OutputSweeper, the only difference between this and that is that track_spendable_outputs would go from async to sync, but in this case its a user who already has support for calling most things async, so not sure why we really care.

@joostjager joostjager force-pushed the async-persister branch 5 times, most recently from 5988195 to cc5703e Compare July 18, 2025 06:06
@joostjager joostjager changed the title Async Persister trait and async OutputSweeper persistence Async background persistence Jul 18, 2025
In preparation for the addition of an async KVStore, we here remove the
Persister pseudo-wrapper. The wrapper is thin, would need to be
duplicated for async, and KVStore isn't fully abstracted anyway anymore
because the sweeper takes it directly.
@joostjager joostjager force-pushed the async-persister branch 3 times, most recently from 98cdc61 to fda8a58 Compare July 18, 2025 09:20
Copy link
Collaborator

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost LGTM, just one real comment and a doc nit.

}

output_info.status.broadcast(cur_hash, cur_height, spending_tx.clone());
self.broadcaster.broadcast_transactions(&[&spending_tx]);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, it used to be the case that we'd first persist, wait for that to finish, then broadcast. I don't think its critical, but it does seem like we should retain that behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to first await the persist future, and then broadcast.

@ldk-reviews-bot
Copy link

🔔 1st Reminder

Hey @tnull! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

Copy link
Collaborator

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question about the requirements we want, but figuring out the answer doesn't have to block landing this PR as-is.

) -> Result<Vec<u8>, io::Error>;
/// Persists the given data under the given `key`.
) -> Pin<Box<dyn Future<Output = Result<Vec<u8>, io::Error>> + 'static + Send>>;
/// Persists the given data under the given `key`. Note that the order of multiple writes calls needs to be retained
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh actually, do we want this to be the restriction, or do we want "the order of multiple writes to the same key needs to be retained"? I imagine the second, we don't currently have a need inside LDK to require a strict total order, and it could definitely substantially slow down async persist. cc @tnull

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One related thing I've been thinking about is whether it is okay to skip a stale write? If two consecutive same-key writes are executed out of order, is it fine to simply drop the first write? Or could it be that we do need to read that first written data at some point?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how it could not be okay - writes overwrite, so if there's two writes to the same key we're required to eventually end up with the second one on disk. Only question, I guess, is whether we're allowed to complete the second future first, then the first future later, and still end up with the second future's write. I think that's something we should accept (and document?) but that's the only caller-observable question, I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of a write -> read -> write pattern, but I believe we already established that that isn't happening in LDK. We weren't going to do ordering for reads anyway.

Copy link
Contributor

@tnull tnull Jul 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of a write -> read -> write pattern,

Hmm, that's indeed a good question, i.e., whether we'd need to deal with interleaving reads also, otherwise we may end up reading data that was written later, actually?

but I believe we already established that that isn't happening in LDK.

I'm not sure where we established that, but for LDK that def. won't be the case for much longer, as we'll want to migrate to stores that are not completely held in-memory, and we'll read data on-demand on cache failures.

Copy link
Collaborator

@TheBlueMatt TheBlueMatt Jul 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of a write -> read -> write pattern, but I believe we already established that that isn't happening in LDK. We weren't going to do ordering for reads anyway.
Hmm, that's indeed a good question, i.e., whether we'd need to deal with interleaving reads also, otherwise we may end up reading data that was written later, actually?

I don't see an issue here - after the storer calls write, the data may be in place (ie returned by a call to read) and after write's future completes is will be in place. That is implicit in the API, and is in fact required by any similar-looking API - you cannot know what is happening after you start the write call, so relying on anything other than the above would obviously be race-y. The same holds for multiple calls to write to the same key.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

Async KV Store Persister
4 participants