Async FilesystemStore #3931

joostjager · 2025-07-15T13:12:41Z

Async filesystem store with eventually consistent writes. It is just using tokio's spawn_blocking, because that is what tokio::fs would otherwise do as well. Using tokio::fs would make it complicated to reuse the sync code.

ldk-node try out: lightningdevkit/ldk-node@main...joostjager:ldk-node:async-fsstore

ldk-reviews-bot · 2025-07-15T13:12:44Z

👋 Thanks for assigning @tnull as a reviewer!
I'll wait for their review and will help manage the review process.
Once they submit their review, I'll check if a second reviewer would be helpful.

tnull · 2025-07-15T13:49:38Z

lightning-persister/src/fs_store_async.rs

+		let this = Arc::clone(&self.inner);
+
+		Box::pin(async move {
+			tokio::task::spawn_blocking(move || {


Mhh, so I'm not sure if spawning blocking tasks for every IO call is the way to go (see for example https://docs.rs/tokio/latest/tokio/fs/index.html#tuning-your-file-io: "To get good performance with file IO on Tokio, it is recommended to batch your operations into as few spawn_blocking calls as possible."). Maybe there are other designs that we should at least consider before moving forward with this approach. For example, we could create a dedicated pool of longer-lived worker task(s) that process a queue?

If we use spawn_blocking, can we give the user control over which runtime this exactly will be spawned on? Also, rather than just doing wrapping, should we be using tokio::fs?

Mhh, so I'm not sure if spawning blocking tasks for every IO call is the way to go (see for example https://docs.rs/tokio/latest/tokio/fs/index.html#tuning-your-file-io: "To get good performance with file IO on Tokio, it is recommended to batch your operations into as few spawn_blocking calls as possible.").

If we should batch operations, I think the current approach is better than using tokio::fs? Because it already batches the various operations inside kvstoresync::write.

Further batching probably needs to happen at a higher level in LDK, and might be a bigger change. Not sure if that is worth it just for FIlesystemStore, especially when that store is not the preferred store for real world usage?

For example, we could create a dedicated pool of longer-lived worker task(s) that process a queue?

Isn't Tokio doing that already when a task is spawned?

If we use spawn_blocking, can we give the user control over which runtime this exactly will be spawned on? Also, rather than just doing wrapping, should we be using tokio::fs?

With tokio::fs, the current runtime is used. I'd think that that is then also sufficient if we spawn ourselves, without a need to specifiy which runtime exactly?

More generally, I think the main purpose of this PR is to show how an async kvstore could be implemented, and to have something for testing potentially. Additionally if there are users that really want to use this type of store in production, they could. But I don't think it is something to spend too much time on. A remote database is probably the more important target to design for.

With tokio::fs, the current runtime is used. I'd think that that is then also sufficient if we spawn ourselves, without a need to specifiy which runtime exactly?

Hmm, I'm not entirely sure, especially for users that have multiple runtime contexts floating around, it might be important to make sure the store uses a particular one (cc @domZippilli ?). I'll also have to think through this for LDK Node when we make the switch to async KVStore there, but happy to leave as-is for now.

tnull · 2025-07-15T13:50:24Z

lightning/src/util/persist.rs

 }

 /// Provides additional interface methods that are required for [`KVStore`]-to-[`KVStore`]
 /// data migration.
-pub trait MigratableKVStore: KVStore {
+pub trait MigratableKVStore: KVStoreSync {


How will we solve this for an KVStore?

I think this comment belongs in #3905?

We might not need to solve it now, as long as we still require a sync implementation alongside an async one? If we support async-only kvstores, then we can create an async version of this trait?

lightning-persister/src/fs_store.rs

joostjager · 2025-07-15T15:38:14Z

Removed garbage collector, because we need to keep the last written version.

codecov · 2025-07-23T19:56:31Z

Codecov Report

❌ Patch coverage is 91.32231% with 21 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.75%. Comparing base (c2d9b97) to head (f72a43c).
⚠️ Report is 107 commits behind head on main.

Files with missing lines	Patch %	Lines
lightning-persister/src/fs_store.rs	91.32%	10 Missing and 11 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3931      +/-   ##
==========================================
- Coverage   88.77%   88.75%   -0.03%     
==========================================
  Files         175      176       +1     
  Lines      127760   129294    +1534     
  Branches   127760   129294    +1534     
==========================================
+ Hits       113425   114753    +1328     
- Misses      11780    11937     +157     
- Partials     2555     2604      +49

Flag	Coverage Δ
fuzzing	`22.03% <47.08%> (-0.08%)`	⬇️
tests	`88.58% <91.32%> (-0.03%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

lightning-persister/src/fs_store_async.rs

joostjager · 2025-07-25T13:48:12Z

Updated code to not use an async wrapper, but conditionally expose the async KVStore trait on FilesystemStore.

I didn't yet update the ldk-node branch using this PR, because it seems many other things broke in main again.

joostjager · 2025-08-25T14:27:54Z

Fuzz passes, but some deviating log lines show up:

Sz:16 Tm:78,882us (i/b/h/e/p/c) New:0/0/0/0/0/9, Cur:0/0/0/1108/49/48567
Sz:5 Tm:17,602us (i/b/h/e/p/c) New:0/0/0/0/0/16, Cur:0/0/0/1108/49/48583
Sz:1 Tm:17,008us (i/b/h/e/p/c) New:0/0/0/0/0/5, Cur:0/0/0/1108/49/48588
Sz:22 Tm:75,985us (i/b/h/e/p/c) New:0/0/0/1/0/27, Cur:0/0/0/1109/49/48615
[2025-08-25T13:56:42+0000][W][28702] subproc_checkTimeLimit():532 pid=28711 took too much time (limit 1 s). Killing it with SIGKILL
[2025-08-25T13:56:42+0000][W][28703] subproc_checkTimeLimit():532 pid=28715 took too much time (limit 1 s). Killing it with SIGKILL
Sz:44 Tm:137,892us (i/b/h/e/p/c) New:0/0/0/1/0/32, Cur:0/0/0/1110/49/48647
Sz:7 Tm:71,188us (i/b/h/e/p/c) New:0/0/0/0/0/5, Cur:0/0/0/1110/49/48652
[2025-08-25T13:56:42+0000][W][28703] arch_checkWait():237 Persistent mode: pid=28715 exited with status: SIGNALED, signal: 9 (Killed)
Sz:3138 Tm:1,007,070us (i/b/h/e/p/c) New:0/0/0/263/3/13664, Cur:0/0/0/1373/52/62316
Sz:7 Tm:38,881us (i/b/h/e/p/c) New:0/0/0/3/0/55, Cur:0/0/0/1376/52/62371
[2025-08-25T13:56:42+0000][W][28702] arch_checkWait():237 Persistent mode: pid=28711 exited with status: SIGNALED, signal: 9 (Killed)
Sz:5662 Tm:1,013,168us (i/b/h/e/p/c) New:0/0/0/2/0/41, Cur:0/0/0/1378/52/62412
Persistent mode: Launched new persistent pid=30858
Persistent mode: Launched new persistent pid=30878

joostjager · 2025-08-25T16:41:39Z

Using /dev/shm as ramdisk if present fixed the timeouts.

joostjager · 2025-08-25T19:28:59Z

Tested with a RAM disk on macos using the tool https://github.com/conorarmstrong/macOS-ramdisk, to see if it isn't too fast now to catch problems. I think it is ok. On my machine RAM disk is about 10x faster than disk. Also when removing the is_stale_version check, it is caught by the fuzzer.

joostjager · 2025-08-26T11:24:37Z

fs_store fuzz stats now:

Summary iterations:100008 time:969 speed:103 crashes_count:0 timeout_count:339 new_units_added:1085 slowest_unit_ms:1114 guard_nb:686400 branch_coverage_percent:0 peak_rss_mb:67

969 secs, maybe reduce iterations somewhat?

joostjager · 2025-08-26T11:29:14Z

Squashed the commits. With the number of fixup commits that were there, I don't think it would be helpful anymore to have them.

TheBlueMatt

A few minor notes, but all of them can reasonably be addressed in a followup.

fuzz/Cargo.toml

TheBlueMatt · 2025-08-26T19:25:04Z

fuzz/src/fs_store.rs

+
+impl Drop for TempFilesystemStore {
+	fn drop(&mut self) {
+		_ = fs::remove_dir_all(&self.temp_path)


We need to make sure all the spawned tasks have finished before we do this. Otherwise cleanup wont work because the async task will re-create the directory as a part of its write.

Good point. First I thought to do something smart with drop, but at that point we don't have the list of handles, and also can't await in drop I think. So just added the final wait to the end of the test fn and avoid early returns.

lightning-persister/src/fs_store.rs

joostjager · 2025-08-26T20:51:24Z

I will address comments in this PR. I definitely want to merge, but there are no conflicts with other PRs, so an early merge isn't really justified.

TheBlueMatt · 2025-08-27T13:53:23Z

Oh, I guess this needs squashing, but feel free.

joostjager · 2025-08-27T13:57:48Z

Yes, kept the fixup because in this case I thought it would make re-review easier. Will let Elias take a look next week and then squash.

ldk-reviews-bot · 2025-08-30T00:01:10Z

🔔 1st Reminder

Hey @tnull! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

ldk-reviews-bot · 2025-09-01T00:01:50Z

🔔 2nd Reminder

Hey @tnull! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

tnull

LGTM, please squash.

All this version/inflight writes tracking seems kind of independent from FilesystemStore-specifics to me. So I do still wonder if it would make sense to refactor out these parts to a reusable wrapper around a generic KVStore, rather than jumping through all these hoops in each implementation individually.

tnull

Feel free to land whenever CI let's you, I think.

TheBlueMatt · 2025-09-01T19:38:31Z

The fs_store fuzzer target is really sad in CI.

TheBlueMatt

One issue but I'm just gonna land this and fix it in post.

TheBlueMatt · 2025-09-02T15:19:49Z

lightning-persister/src/fs_store.rs

+		};
+
+		Box::pin(async move {
+			tokio::task::spawn_blocking(move || this.read(path)).await.unwrap_or_else(|e| {


The unwrap_or_else calls in these methods can/should be dropped.

What to do then with JoinError if returned by spawn_blocking?

What to do then with JoinError if returned by spawn_blocking?

#4047 (comment)

Ah, thanks ;)

joostjager changed the title ~~Async fsstore~~ Async FilesystemStore Jul 15, 2025

joostjager force-pushed the async-fsstore branch 4 times, most recently from 29b8bcf to 81ad668 Compare July 15, 2025 13:40

tnull reviewed Jul 15, 2025

View reviewed changes

joostjager force-pushed the async-fsstore branch from 81ad668 to e462bce Compare July 15, 2025 15:22

joostjager added this to Weekly Goals Jul 17, 2025

joostjager self-assigned this Jul 17, 2025

joostjager mentioned this pull request Jul 17, 2025

Async Persistence TODOs #3052

Open

24 tasks

joostjager force-pushed the async-fsstore branch 2 times, most recently from 97d6b3f to 02dce94 Compare July 23, 2025 18:11

joostjager force-pushed the async-fsstore branch 2 times, most recently from c061fcd to 2492508 Compare July 24, 2025 08:31

joostjager marked this pull request as ready for review July 24, 2025 08:32

ldk-reviews-bot requested a review from tankyleo July 24, 2025 08:32

joostjager force-pushed the async-fsstore branch 2 times, most recently from 9938dfe to 7d98528 Compare July 24, 2025 09:39

joostjager commented Jul 24, 2025

View reviewed changes

lightning-persister/src/fs_store_async.rs Outdated Show resolved Hide resolved

graphite-app bot reviewed Jul 24, 2025

View reviewed changes

lightning-persister/src/fs_store_async.rs Outdated Show resolved Hide resolved

joostjager force-pushed the async-fsstore branch 5 times, most recently from 38ab949 to dd9e1b5 Compare July 25, 2025 13:39

joostjager requested a review from tnull July 25, 2025 13:51

joostjager force-pushed the async-fsstore branch from 4efbeee to 2dbf59c Compare August 25, 2025 19:34

joostjager force-pushed the async-fsstore branch from b41648b to a545f9b Compare August 26, 2025 11:28

joostjager requested a review from TheBlueMatt August 26, 2025 15:31

TheBlueMatt previously approved these changes Aug 26, 2025

View reviewed changes

joostjager dismissed TheBlueMatt’s stale review via 96091ec August 27, 2025 08:36

joostjager requested a review from TheBlueMatt August 27, 2025 08:39

TheBlueMatt previously approved these changes Aug 27, 2025

View reviewed changes

joostjager requested a review from tnull August 27, 2025 13:56

tnull reviewed Sep 1, 2025

View reviewed changes

joostjager force-pushed the async-fsstore branch from 96091ec to 24a0fc7 Compare September 1, 2025 14:14

tnull previously approved these changes Sep 1, 2025

View reviewed changes

Add async implementation of FilesystemStore

f72a43c

joostjager dismissed stale reviews from tnull and TheBlueMatt via f72a43c September 2, 2025 12:39

joostjager force-pushed the async-fsstore branch from 24a0fc7 to f72a43c Compare September 2, 2025 12:39

tnull approved these changes Sep 2, 2025

View reviewed changes

TheBlueMatt approved these changes Sep 2, 2025

View reviewed changes

TheBlueMatt merged commit b40dca0 into lightningdevkit:main Sep 2, 2025
25 checks passed

github-project-automation bot moved this to Done in Weekly Goals Sep 2, 2025

Async FilesystemStore #3931

Async FilesystemStore #3931

Uh oh!

Conversation

joostjager commented Jul 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ldk-reviews-bot commented Jul 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

joostjager commented Jul 15, 2025

Uh oh!

codecov bot commented Jul 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

joostjager commented Jul 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

joostjager commented Aug 25, 2025

Uh oh!

joostjager commented Aug 25, 2025

Uh oh!

joostjager commented Aug 25, 2025

Uh oh!

joostjager commented Aug 26, 2025

Uh oh!

joostjager commented Aug 26, 2025

Uh oh!

TheBlueMatt left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

joostjager commented Aug 26, 2025

Uh oh!

TheBlueMatt commented Aug 27, 2025

Uh oh!

joostjager commented Aug 27, 2025

Uh oh!

ldk-reviews-bot commented Aug 30, 2025

Uh oh!

ldk-reviews-bot commented Sep 1, 2025

Uh oh!

tnull left a comment

Choose a reason for hiding this comment

Uh oh!

tnull left a comment

Choose a reason for hiding this comment

Uh oh!

TheBlueMatt commented Sep 1, 2025

Uh oh!

TheBlueMatt left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

joostjager commented Jul 15, 2025 •

edited

Loading

ldk-reviews-bot commented Jul 15, 2025 •

edited

Loading

codecov bot commented Jul 23, 2025 •

edited

Loading

joostjager commented Jul 25, 2025 •

edited

Loading