-
Notifications
You must be signed in to change notification settings - Fork 418
Async background persistence #3905
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
👋 Thanks for assigning @TheBlueMatt as a reviewer! |
1b95d30
to
21dc34c
Compare
3fb7d6b
to
1847e8d
Compare
1f59bbe
to
723a5a6
Compare
bc9c29a
to
90ab1ba
Compare
lightning/src/util/sweep.rs
Outdated
fn persist_state<'a>( | ||
&self, sweeper_state: &SweeperState, | ||
) -> Pin<Box<dyn Future<Output = Result<(), io::Error>> + 'a + Send>> { | ||
let encoded = &sweeper_state.encode(); | ||
|
||
self.kv_store.write( | ||
OUTPUT_SWEEPER_PERSISTENCE_PRIMARY_NAMESPACE, | ||
OUTPUT_SWEEPER_PERSISTENCE_SECONDARY_NAMESPACE, | ||
OUTPUT_SWEEPER_PERSISTENCE_KEY, | ||
encoded, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The encoded
variable is captured by reference in the returned future, but it's a local variable that will be dropped when the function returns. This creates a potential use-after-free issue. Consider moving ownership of encoded
into the future instead:
fn persist_state<'a>(
&self, sweeper_state: &SweeperState,
) -> Pin<Box<dyn Future<Output = Result<(), io::Error>> + 'a + Send>> {
let encoded = sweeper_state.encode();
self.kv_store.write(
OUTPUT_SWEEPER_PERSISTENCE_PRIMARY_NAMESPACE,
OUTPUT_SWEEPER_PERSISTENCE_SECONDARY_NAMESPACE,
OUTPUT_SWEEPER_PERSISTENCE_KEY,
&encoded,
)
}
This ensures the data remains valid for the lifetime of the future.
fn persist_state<'a>( | |
&self, sweeper_state: &SweeperState, | |
) -> Pin<Box<dyn Future<Output = Result<(), io::Error>> + 'a + Send>> { | |
let encoded = &sweeper_state.encode(); | |
self.kv_store.write( | |
OUTPUT_SWEEPER_PERSISTENCE_PRIMARY_NAMESPACE, | |
OUTPUT_SWEEPER_PERSISTENCE_SECONDARY_NAMESPACE, | |
OUTPUT_SWEEPER_PERSISTENCE_KEY, | |
encoded, | |
) | |
fn persist_state<'a>( | |
&self, sweeper_state: &SweeperState, | |
) -> Pin<Box<dyn Future<Output = Result<(), io::Error>> + 'a + Send>> { | |
let encoded = sweeper_state.encode(); | |
self.kv_store.write( | |
OUTPUT_SWEEPER_PERSISTENCE_PRIMARY_NAMESPACE, | |
OUTPUT_SWEEPER_PERSISTENCE_SECONDARY_NAMESPACE, | |
OUTPUT_SWEEPER_PERSISTENCE_KEY, | |
&encoded, | |
) | |
Spotted by Diamond
Is this helpful? React 👍 or 👎 to let us know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this real?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so as the compiler would likely optimize that away, given that encoded
will be an owned value (Vec
returned by encode()
). Still, the change that it suggests looks cleaner.
In general it will be super confusing that we encode
at the time of creating the future, but would only actually persist once we dropped the lock. Starting from now we'll need to be super cautious about the side-effects of interleaving persist calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is that an async kv store store encodes the data and stores the write action in a queue at the moment the future is created. Things should still happen in the original order.
Can you show a specific scenario where we have to be super cautious even if we have that queue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved &
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is that an async kv store store encodes the data and stores the write action in a queue at the moment the future is created. Things should still happen in the original order.
If that is the idea that we start assuming in this PR, we should probably also start documenting these assumptions in this PR on KVStore
already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added this requirement to the async KVStore
trait doc
32e17b3
to
9cf8375
Compare
9cf8375
to
c3a4c24
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned above, changes basically look good to me, although I'd prefer to avoid process_events_full_async
by using the builder introduced in #3688.
But, generally this should be good for a second reviewer, so pinging @TheBlueMatt.
lightning/src/util/persist.rs
Outdated
/// Trait that handles persisting a [`ChannelManager`], [`NetworkGraph`], and [`WriteableScore`] to disk. | ||
/// | ||
/// [`ChannelManager`]: crate::ln::channelmanager::ChannelManager | ||
pub trait Persister<'a, CM: Deref, L: Deref, S: Deref> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given Persister
is only used in lightning-background-processor
and we've been migrating to just using KVStore
everywhere (eg its now required to use Sweeper
), maybe we just kill off Persister
entirely? We had Persister
before we had KVStore
as a way to persist the objects that the BP wanted to persist. To simplify the interface, we added the KVStore
as a pseudo-wrapper around Persister
. But since Sweeper
now requires a KVStore
explicitly, users can no longer only implement Persister
, making it basically useless.
The only reason to keep it would be to avoid building the encoded Vec<u8>
of the network graph and scorer for users who are avoiding persisting those objects, but I'm not entirely sure avoiding the ~50MiB memory spike during write is worth it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added first commit that removes Persister
. Rebased the rest. Lots of simplication.
lightning/src/util/sweep.rs
Outdated
@@ -922,6 +930,173 @@ where | |||
} | |||
} | |||
|
|||
/// A wrapper around [`OutputSweeper`] to be used with a sync kv store. | |||
pub struct OutputSweeperSyncKVStore< |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also not clear on why this needs a separate wrapper. We can have a second constructor on OutputSweeper
that does the KVStore
sync wrapping before returning a fully-async OutputSweeper
, the only difference between this and that is that track_spendable_outputs
would go from async to sync, but in this case its a user who already has support for calling most things async, so not sure why we really care.
5988195
to
cc5703e
Compare
Persister
trait and async OutputSweeper
persistencecc5703e
to
e83affa
Compare
In preparation for the addition of an async KVStore, we here remove the Persister pseudo-wrapper. The wrapper is thin, would need to be duplicated for async, and KVStore isn't fully abstracted anyway anymore because the sweeper takes it directly.
98cdc61
to
fda8a58
Compare
fda8a58
to
1809349
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost LGTM, just one real comment and a doc nit.
lightning/src/util/sweep.rs
Outdated
} | ||
|
||
output_info.status.broadcast(cur_hash, cur_height, spending_tx.clone()); | ||
self.broadcaster.broadcast_transactions(&[&spending_tx]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, it used to be the case that we'd first persist, wait for that to finish, then broadcast. I don't think its critical, but it does seem like we should retain that behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to first await the persist future, and then broadcast.
🔔 1st Reminder Hey @tnull! This PR has been waiting for your review. |
1809349
to
8f79368
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One question about the requirements we want, but figuring out the answer doesn't have to block landing this PR as-is.
) -> Result<Vec<u8>, io::Error>; | ||
/// Persists the given data under the given `key`. | ||
) -> Pin<Box<dyn Future<Output = Result<Vec<u8>, io::Error>> + 'static + Send>>; | ||
/// Persists the given data under the given `key`. Note that the order of multiple writes calls needs to be retained |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh actually, do we want this to be the restriction, or do we want "the order of multiple writes to the same key needs to be retained"? I imagine the second, we don't currently have a need inside LDK to require a strict total order, and it could definitely substantially slow down async persist. cc @tnull
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One related thing I've been thinking about is whether it is okay to skip a stale write? If two consecutive same-key writes are executed out of order, is it fine to simply drop the first write? Or could it be that we do need to read that first written data at some point?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see how it could not be okay - writes overwrite, so if there's two writes to the same key we're required to eventually end up with the second one on disk. Only question, I guess, is whether we're allowed to complete the second future first, then the first future later, and still end up with the second future's write. I think that's something we should accept (and document?) but that's the only caller-observable question, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking of a write -> read -> write pattern, but I believe we already established that that isn't happening in LDK. We weren't going to do ordering for reads anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking of a write -> read -> write pattern,
Hmm, that's indeed a good question, i.e., whether we'd need to deal with interleaving reads also, otherwise we may end up reading data that was written later, actually?
but I believe we already established that that isn't happening in LDK.
I'm not sure where we established that, but for LDK that def. won't be the case for much longer, as we'll want to migrate to stores that are not completely held in-memory, and we'll read data on-demand on cache failures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking of a write -> read -> write pattern, but I believe we already established that that isn't happening in LDK. We weren't going to do ordering for reads anyway.
Hmm, that's indeed a good question, i.e., whether we'd need to deal with interleaving reads also, otherwise we may end up reading data that was written later, actually?
I don't see an issue here - after the storer calls write
, the data may be in place (ie returned by a call to read
) and after write
's future completes is will be in place. That is implicit in the API, and is in fact required by any similar-looking API - you cannot know what is happening after you start the write
call, so relying on anything other than the above would obviously be race-y. The same holds for multiple calls to write to the same key.
Stripped down version of #3778. It allows background persistence to be async, but channel monitor persistence remains sync. This means that for the time being, users wanting async background persistence would be required to implement both the sync and the async
KVStore
trait. This model is available throughprocess_events_full_async
.process_events_async
still takes a synchronous kv store to remain backwards compatible.Usage in ldk-node: lightningdevkit/ldk-node@main...joostjager:ldk-node:upgrade-to-async-kvstore