-
Notifications
You must be signed in to change notification settings - Fork 418
Async FilesystemStore #3931
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Async FilesystemStore #3931
Conversation
👋 Thanks for assigning @tnull as a reviewer! |
29b8bcf
to
81ad668
Compare
let this = Arc::clone(&self.inner); | ||
|
||
Box::pin(async move { | ||
tokio::task::spawn_blocking(move || { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mhh, so I'm not sure if spawning blocking tasks for every IO call is the way to go (see for example https://docs.rs/tokio/latest/tokio/fs/index.html#tuning-your-file-io: "To get good performance with file IO on Tokio, it is recommended to batch your operations into as few spawn_blocking calls as possible."). Maybe there are other designs that we should at least consider before moving forward with this approach. For example, we could create a dedicated pool of longer-lived worker task(s) that process a queue?
If we use spawn_blocking
, can we give the user control over which runtime this exactly will be spawned on? Also, rather than just doing wrapping, should we be using tokio::fs
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mhh, so I'm not sure if spawning blocking tasks for every IO call is the way to go (see for example https://docs.rs/tokio/latest/tokio/fs/index.html#tuning-your-file-io: "To get good performance with file IO on Tokio, it is recommended to batch your operations into as few spawn_blocking calls as possible.").
If we should batch operations, I think the current approach is better than using tokio::fs? Because it already batches the various operations inside kvstoresync::write.
Further batching probably needs to happen at a higher level in LDK, and might be a bigger change. Not sure if that is worth it just for FIlesystemStore, especially when that store is not the preferred store for real world usage?
For example, we could create a dedicated pool of longer-lived worker task(s) that process a queue?
Isn't Tokio doing that already when a task is spawned?
If we use spawn_blocking, can we give the user control over which runtime this exactly will be spawned on? Also, rather than just doing wrapping, should we be using tokio::fs?
With tokio::fs, the current runtime is used. I'd think that that is then also sufficient if we spawn ourselves, without a need to specifiy which runtime exactly?
More generally, I think the main purpose of this PR is to show how an async kvstore could be implemented, and to have something for testing potentially. Additionally if there are users that really want to use this type of store in production, they could. But I don't think it is something to spend too much time on. A remote database is probably the more important target to design for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With tokio::fs, the current runtime is used. I'd think that that is then also sufficient if we spawn ourselves, without a need to specifiy which runtime exactly?
Hmm, I'm not entirely sure, especially for users that have multiple runtime contexts floating around, it might be important to make sure the store uses a particular one (cc @domZippilli ?). I'll also have to think through this for LDK Node when we make the switch to async KVStore there, but happy to leave as-is for now.
lightning/src/util/persist.rs
Outdated
} | ||
|
||
/// Provides additional interface methods that are required for [`KVStore`]-to-[`KVStore`] | ||
/// data migration. | ||
pub trait MigratableKVStore: KVStore { | ||
pub trait MigratableKVStore: KVStoreSync { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How will we solve this for an KVStore
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this comment belongs in #3905?
We might not need to solve it now, as long as we still require a sync implementation alongside an async one? If we support async-only kvstores, then we can create an async version of this trait?
81ad668
to
e462bce
Compare
Removed garbage collector, because we need to keep the last written version. |
97d6b3f
to
02dce94
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3931 +/- ##
==========================================
- Coverage 88.77% 88.75% -0.03%
==========================================
Files 175 176 +1
Lines 127760 129294 +1534
Branches 127760 129294 +1534
==========================================
+ Hits 113425 114753 +1328
- Misses 11780 11937 +157
- Partials 2555 2604 +49
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
c061fcd
to
2492508
Compare
9938dfe
to
7d98528
Compare
38ab949
to
dd9e1b5
Compare
Updated code to not use an async wrapper, but conditionally expose the async I didn't yet update the |
Fuzz passes, but some deviating log lines show up:
|
Using /dev/shm as ramdisk if present fixed the timeouts. |
Tested with a RAM disk on macos using the tool https://github.com/conorarmstrong/macOS-ramdisk, to see if it isn't too fast now to catch problems. I think it is ok. On my machine RAM disk is about 10x faster than disk. Also when removing the |
4efbeee
to
2dbf59c
Compare
fs_store fuzz stats now:
969 secs, maybe reduce iterations somewhat? |
b41648b
to
a545f9b
Compare
Squashed the commits. With the number of fixup commits that were there, I don't think it would be helpful anymore to have them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few minor notes, but all of them can reasonably be addressed in a followup.
|
||
impl Drop for TempFilesystemStore { | ||
fn drop(&mut self) { | ||
_ = fs::remove_dir_all(&self.temp_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to make sure all the spawned tasks have finished before we do this. Otherwise cleanup wont work because the async task will re-create the directory as a part of its write.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. First I thought to do something smart with drop, but at that point we don't have the list of handles, and also can't await in drop
I think. So just added the final wait to the end of the test fn and avoid early returns.
I will address comments in this PR. I definitely want to merge, but there are no conflicts with other PRs, so an early merge isn't really justified. |
Oh, I guess this needs squashing, but feel free. |
Yes, kept the fixup because in this case I thought it would make re-review easier. Will let Elias take a look next week and then squash. |
🔔 1st Reminder Hey @tnull! This PR has been waiting for your review. |
🔔 2nd Reminder Hey @tnull! This PR has been waiting for your review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, please squash.
All this version/inflight writes tracking seems kind of independent from FilesystemStore
-specifics to me. So I do still wonder if it would make sense to refactor out these parts to a reusable wrapper around a generic KVStore
, rather than jumping through all these hoops in each implementation individually.
96091ec
to
24a0fc7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to land whenever CI let's you, I think.
The |
24a0fc7
to
f72a43c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One issue but I'm just gonna land this and fix it in post.
}; | ||
|
||
Box::pin(async move { | ||
tokio::task::spawn_blocking(move || this.read(path)).await.unwrap_or_else(|e| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The unwrap_or_else
calls in these methods can/should be dropped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What to do then with JoinError
if returned by spawn_blocking
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What to do then with
JoinError
if returned byspawn_blocking
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, thanks ;)
Async filesystem store with eventually consistent writes. It is just using tokio's
spawn_blocking
, because that is whattokio::fs
would otherwise do as well. Usingtokio::fs
would make it complicated to reuse the sync code.ldk-node try out: lightningdevkit/ldk-node@main...joostjager:ldk-node:async-fsstore