Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lfs_alloc() blocks on full filesystem. #1063

Open
ddomnik opened this issue Jan 11, 2025 · 3 comments
Open

lfs_alloc() blocks on full filesystem. #1063

ddomnik opened this issue Jan 11, 2025 · 3 comments
Labels

Comments

@ddomnik
Copy link

ddomnik commented Jan 11, 2025

This issue is derived from this Espressif port: joltwallet/esp_littlefs#213 But I think it most likely fits better here.

Usually on a full filesystem littleFS returns from fwrite() within a few ms and prints:
E (2818207) esp_littlefs: ./managed_components/joltwallet__littlefs/src/littlefs/lfs.c:689:error: No more free space 0x48d

However, it could happen that the library gets "stuck" for a very long time (multiple seconds or even minutes - prob. depending on partition size). This happens exactly in this while loop. (The second while in lfs_alloc())

littlefs/lfs.c

Line 656 in d01280e

while (lfs->lookahead.next < lfs->lookahead.size) {

The problem is, that almost every program uses watchdogs. Espressifs default watchdog has a maximum of 60 seconds, which seems to be insufficient in rare cases and then resets the whole device. maximum wdt timeout

Maybe some experts know how to handle this issue.

Based on my current understanding of the issue, a very simple thought would be to give littleFs it's own watchdog that can be set. This allows to "catch" the issue on application level rather than system level.

@geky geky added the question label Feb 3, 2025
@geky
Copy link
Member

geky commented Feb 3, 2025

Hi @ddomnik, thanks for creating an issue.

One thing to note, while LittleFS does have some long-running operations, these should always involve IO-operations of some sort. It should never get stuck in a CPU-only function, if it does that's a bug and something a watchdog should catch.

One option is to reset the watchdog in the low-level bd read/prog/erase functions. This would avoid false positives with long-running filesystem work. This is also a good place for yield calls for multithreaded/coroutine systems.

Having a second, much longer, watchdog at the application-level is still possible, as you note. You may be able to emulate it in software (via a counter that increments when you reset the low-level watchdog?) to avoid using an additional hardware resource.


However, it could happen that the library gets "stuck" for a very long time (multiple seconds or even minutes - prob. depending on partition size). This happens exactly in this while loop. (The second while in lfs_alloc())

This is most likely the allocator trying to scan the filesystem one last time to find any free blocks. This can be reduced at a RAM cost by increasing cfg.lookahead_size, but the scan scales at $O\left(n^2/L\right)$, so it can still end up very expensive for large partitions.

There is some ongoing work to introduce an optional block-map that will hopefully help with this.

@BrianPugh
Copy link
Contributor

One option is to reset the watchdog in the low-level bd read/prog/erase functions.

That's an idea. We could add an additional layer of callbacks in esp_littlefs by adding 4x function handles to esp_vfs_littlefs_conf_t corresponding to read/write/erase/sync. We could have them be conditional based off macros so that the structure is only larger for users that want to use the callbacks.

Additionally, we could just make a configuration that enables/disables automatic calling of esp_task_wdt_reset in the read/write/erase/sync functions.

If either of these is something we want to pursue, please follow up in joltwallet/esp_littlefs#213

@geky
Copy link
Member

geky commented Feb 13, 2025

It probably won't help you now, but eventually I'd like to redesign the bd API to take ctx directly. It's an unfortunate missed opportunity that the current API makes composability difficult...

But the bd redesign is currently mid-priority in a sea of high-priority todos...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants