Skip to content

add async support using bisync crate #176

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: develop
Choose a base branch
from

Conversation

Be-ing
Copy link

@Be-ing Be-ing commented Feb 6, 2025

This moves the existing API into a blocking module and duplicates the API into a new asynchronous module, using async fn where applicable. This uses the bisync crate rather than maybe_async used by #154. This involved moving everything in the src directory into a new src/inner directory, then exposing blocking/async versions via how they are imported in src/lib.rs.

I think this resolves the concerns brought up in #154: sync and async can both be built at the same time, usage of bisync by other crates in the dependency graph doesn't affect this, and changes to the example and test code are minimal (just adjusting import path).

I moved submodules that have no difference between blocking & async to a new common module. That code gets re-exported under the blocking/asynchronous modules in the same relative path in the module hierarchy as before. There may be opportunities to split more code into that common module, but I have tried to keep changes to a minimum for now to make it easier to review.

I have not added Cargo features for sync/async; they're both built. If requested, I could put each behind a feature gate, though I don't think it's necessary.

Fixes #50

This was referenced Feb 6, 2025
@Be-ing Be-ing force-pushed the bisync branch 4 times, most recently from d7ba16b to 8c5f264 Compare February 6, 2025 07:59
Copy link
Member

@thejpster thejpster left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An interesting approach! I want to dig into the docs in more detail but it seems workable - modulo that AFIT feature-flag that has appeared.

@thejpster
Copy link
Member

It might also be good to move the example import renames to another PR to keep this one cleaner so we can see precisely only that which is affected by the async additions.

@Be-ing
Copy link
Author

Be-ing commented Feb 6, 2025

It might also be good to move the example import renames to another PR to keep this one cleaner so we can see precisely only that which is affected by the async additions.

That goes towards a bigger question: do you want to break the public API by moving the existing API into a new blocking module? This can be avoided by making the blocking module private and instead putting pub use blocking::*; in src/lib.rs. I don't think that would be great for downstream users in the long term though. My primary concern is that it would make the documentation confusing to navigate. Here's how the front page of the documentation would look with that:

image

There's the whole blocking API of the crate, then the asynchronous module... and when you go to the documentation for that, you see the same API:

image

I think this would be rather confusing. It's difficult to even tell that I clicked a link and navigated to another page because it looks so similar to the documentation's front page.

By moving the existing API into a blocking module (how this branch currently is), the front page of the documentation looks like this:

image

@Be-ing
Copy link
Author

Be-ing commented Feb 6, 2025

I've avoided running cargo fmt for now to minimize the diff because it's rather big already. If you'd like that to be run now, just let me know, or I could do it later after you've reviewed.

@Be-ing
Copy link
Author

Be-ing commented Feb 6, 2025

I had to use the #[only_sync] macro in a few places. Those are the impl Drop for File and Volume; unfortunately, there is no such thing (yet) in Rust as an async drop, so that's the best we can do. VolumeManager::close_file and VolumeManager::close_file can be invoked manually though. I presume bad things would happen if those aren't closed manually? If so, I can add a note to the documentation that has to be done manually for the async versions.

The other places I used #[only_sync] were just to make the tests simple to implement.

@thejpster
Copy link
Member

I presume bad things would happen if those aren't closed manually?

Yes, leaking a file handle means you can't reopen it, and handles are a finite resource. Plus certain context is only written to disk on file close so the disk ends up needing a fsck/scandisk pass.

As long as that only affects the async users that's fine.

Moving the blocking API into the blocking module is fine with me.

@thejpster
Copy link
Member

Can you rebase? #177 might make this smaller.

@thejpster
Copy link
Member

Is it possible to give the async version a blocking drop? Better than doing nothing.

@Be-ing Be-ing force-pushed the bisync branch 2 times, most recently from d75eb0f to de430ba Compare February 12, 2025 06:53
@Be-ing
Copy link
Author

Be-ing commented Feb 12, 2025

Can you rebase? #177 might make this smaller.

Okay, rebased.

Is it possible to give the async version a blocking drop? Better than doing nothing.

Unfortunately not. That would require this crate to bring in its own async executor and set that up just within impl Drop. It's questionable if that would be desirable if it was even possible. In a library that uses std, you can wrap async functions with a simple async executor like futures_lite::future::block_on or pollster::block_on, but both of those use std modules like std::future and std::sync that aren't available in core.

@thejpster
Copy link
Member

Is there no no_std block_on! ? Can't you just spin polling the Future?

@Be-ing
Copy link
Author

Be-ing commented Feb 12, 2025

I just found embassy_futures::block_on, however, I don't think it's a good idea to use it to impl Drop for async Files/Volumes. We can't know if a user would want that and there's no way to opt out of Drop::drop getting called if it's implemented. For example, the user could have strict power consumption requirements that require putting the MCU to sleep on every .await as Embassy and RTIC do.

@thejpster
Copy link
Member

But it's probably better than a corrupt file system, and any correct program will close them asynchronously before dropping.

What do other async File APIs do?

@Be-ing
Copy link
Author

Be-ing commented Feb 12, 2025

tokio::fs::File:

A file will not be closed immediately when it goes out of scope if there are any IO operations that have not yet completed. To ensure that a file is closed immediately when it is dropped, you should call flush before dropping it. Note that this does not ensure that the file has been fully written to disk; the operating system might keep the changes around in an in-memory buffer. See the sync_all method for telling the OS to write the data to disk.

async-std::fs::File:

Files are automatically closed when they get dropped and any errors detected on closing are ignored. Use the sync_all method before dropping a file if such errors need to be handled.

impl Drop for File in async-std:

impl Drop for File {
    fn drop(&mut self) {
        // We need to flush the file on drop. Unfortunately, that is not possible to do in a
        // non-blocking fashion, but our only other option here is losing data remaining in the
        // write cache. Good task schedulers should be resilient to occasional blocking hiccups in
        // file destructors so we don't expect this to be a common problem in practice.
        let _ = futures_lite::future::block_on(self.flush());
    }
}

Suggestion: impl Drop for async File/Volume using embassy_futures::block_on, but put that behind an async_drop Cargo feature that is enabled by default. This will Just Work by default, and any project where that little bit of blocking is really a problem has an escape hatch.

@Be-ing Be-ing force-pushed the bisync branch 2 times, most recently from 3c49859 to 7b15cf5 Compare February 12, 2025 23:42
@Be-ing
Copy link
Author

Be-ing commented Feb 12, 2025

Suggestion: impl Drop for async File/Volume using embassy_futures::block_on, but put that behind an async_drop Cargo feature that is enabled by default. This will Just Work by default, and any project where that little bit of blocking is really a problem has an escape hatch.

Nevermind, I forgot core::mem::forget exists to opt out of Drop, and of course File/Volume::close already use it. There's no need for a Cargo feature to prevent dropping; that can be opted out of just by calling File/Volume::close.

@Be-ing
Copy link
Author

Be-ing commented Feb 15, 2025

Some blogs about async drop:
https://without.boats/blog/asynchronous-clean-up/
https://sabrinajewson.org/blog/async-drop
https://smallcultfollowing.com/babysteps/blog/2023/03/16/must-move-types/

TL;DR: It's complicated and Rust is a long way from async drop being implemented. Probably not for some more years.

Blocking in Drop is an ugly workaround, but I think it's sufficient for now.

@thejpster
Copy link
Member

You'll need to handle dropping Directories too.

@Be-ing
Copy link
Author

Be-ing commented Feb 15, 2025

Directory::drop doesn't make any async calls, so there's nothing to do there. The existing implementation is already enough.

@yanshay
Copy link

yanshay commented Mar 28, 2025

That's a pretty bad failure mode and makes me wonder if we should have the option to Drop File and Volume. We could panic in those Drop impls with a helpful error messages saying to use close().await instead. Maybe we could make that the default Drop impl and add a Cargo feature embassy-time-generic-queue to opt into the block_on impls of Drop?

Thanks for all the testing and feedback everyone, it's is very useful!

That would have definitely been helpful and save quite some time. I would have never thought of the close if I didn't go through this discussion before and even with that I initially missed the await, only the warning made me notice that.

Other than that, could missing the proper close be the cause for the failures I saw about CardNotFound? The drops come later so I wonder if the failures were due to not closing properly on earlier executions but I think it took place also right after power up of the device, only it doesn't any longer.

@thejpster
Copy link
Member

could missing the proper close be the cause for the failures I saw about CardNotFound

Unlikely. Closing a file just involves writing out a new directory entry with the updated file size. But the card doesn't care what you write to it - it just sees block reads and block writes.

@Be-ing
Copy link
Author

Be-ing commented Mar 29, 2025

Unlikely. Closing a file just involves writing out a new directory entry with the updated file size.

What about the Volume? Could the missing closure of the Volume cause what @yanshay encountered?

@thejpster
Copy link
Member

Nope. Nothing is written to the disk when you close a volume. The open volume just caches some filesystem metadata to save us looking it up. You can verify this by looking at the code that is executed when you close a volume.

@thejpster
Copy link
Member

SD Card initialisation is messy, and I've only tested this crate (and only in sync mode) on a limited number of cards. My recommendation is to strip everything down to the bare minimum, and put a logic analyser on the bus so you can see what's going to and from the card. And try the hardware with a known good SD Card implementation like fatfs. If fatfs works and this crate doesn't, we can look at the traces to try and work out why.

I speak from ~25 years of experience when I say that complex issues are not usually solved by trying things at random. You have to go back to the most basic state you can until you hit something solid, and then build up from there. If nothing seems solid, go lower down.

@yanshay
Copy link

yanshay commented Mar 29, 2025

SD Card initialisation is messy, and I've only tested this crate (and only in sync mode) on a limited number of cards. My recommendation is to strip everything down to the bare minimum, and put a logic analyser on the bus so you can see what's going to and from the card. And try the hardware with a known good SD Card implementation like fatfs. If fatfs works and this crate doesn't, we can look at the traces to try and work out why.

I speak from ~25 years of experience when I say that complex issues are not usually solved by trying things at random. You have to go back to the most basic state you can until you hit something solid, and then build up from there. If nothing seems solid, go lower down.

The thing is that now it's working, not sure what changes I did got it working.

So are you basically saying that the SDCard and not the Board hosting the SDCard is meaningful?
So even if it works for me on a specific SDCard it doesn't guarantee that it will work for someone else using the same board if they place a different brand of SDCard?
It's one thing if it fails for me, I can troubleshoot using LA, but if it happens for someone else using my software on the same device but with a different SDCard it's a whole different story.

@thejpster
Copy link
Member

So even if it works for me on a specific SDCard it doesn't guarantee that it will work for someone else using the same board if they place a different brand of SDCard?

Definitely. Each SD Card is a tiny computer running its own firmware, and they all do things slightly differently.

@Be-ing
Copy link
Author

Be-ing commented Mar 29, 2025

So are you basically saying that the SDCard and not the Board hosting the SDCard is meaningful?

Yes. My motivation for working on this is rewrite the firmware for a little music player gadget in Rust. At first I just wanted to do this for fun, but I've been struggling with bizarre, extreme slowdowns in read speeds with the C++ firmware, but only with one particular SD card. After trying everything I can with that firmware, my only idea left before concluding some interaction between the board and my particular SD card is cursed is trying a completely different software stack.

@avsaase
Copy link

avsaase commented Apr 4, 2025

I think I ran into another problem or limitation. My SD card shares the SPI bus with another device using embassy_embedded_hal::shared_bus::asynch::spi::SpiDevice. When writing to an open file I sometimes get a DeviceError(WriteError), specifically from here: https://github.com/Be-ing/embedded-sdmmc-rs/blob/e7eef2a4a241bbc3fcc1e101b06c98adcc8e17bd/src/sdcard/mod.rs#L247-L249. Since the write operation can be interrupted at any await point, I suspect the shared bus causes a delay between sending the CMD13 command and receiving the response byte and that this causes some sort of timeout in the card. A retry of the write seems to always succeed. So far I haven't seen any partially completed writes but it feels sketchy to rely on this.

I haven't had the time to probe it with a logic analyzer yet so this is all still speculation at this point. I put together this repoducer that triggers the error. The blocking version works fine.

@Be-ing
Copy link
Author

Be-ing commented Apr 4, 2025

SdCardInner::card_command already has a retry loop with a delay... 🤔

        let mut delay = Delay::new_command();
        loop {
            let result = self.read_byte().await?;
            if (result & 0x80) == ERROR_OK {
                return Ok(result);
            }
            delay
                .delay(&mut self.delayer, Error::TimeoutCommand(command))
                .await?;
        }

@Be-ing
Copy link
Author

Be-ing commented Apr 4, 2025

specifically from here: https://github.com/Be-ing/embedded-sdmmc-rs/blob/e7eef2a4a241bbc3fcc1e101b06c98adcc8e17bd/src/sdcard/mod.rs#L247-L249

Are you sure the Error::WriteError isn't coming from the read_byte call just below that?

            if self.card_command(CMD13, 0).await? != 0x00 {
                return Err(Error::WriteError);
            }
            if self.read_byte().await? != 0x00 {
                return Err(Error::WriteError);
            }

@avsaase
Copy link

avsaase commented Apr 4, 2025

Yes. I added some logs like this

if self.card_command(CMD13, 0).await? != 0x00 {
    warn!("CMD13 failed");
    return Err(Error::WriteError);
}
if self.read_byte().await? != 0x00 {
    warn!("Read byte oyther than 0x00");
    return Err(Error::WriteError);
}

and it's consistently the the first one.

@avsaase
Copy link

avsaase commented Apr 4, 2025

I forgot to mention that the reproducer occasionally shows a DeviceError(DiskFull) error as well. I haven't been able to reliably reproduce that one but I figured it's good to know.

@Be-ing
Copy link
Author

Be-ing commented Apr 4, 2025

Is it reproducible with multiple SD cards?

@avsaase
Copy link

avsaase commented Apr 4, 2025

Is it reproducible with multiple SD cards?

I'm seeing the same error with a SanDisk Extreme U3 32GB and an off-brand U1 16GB card. Presumably the SanDisk card is of decent quality.

@thejpster
Copy link
Member

Interesting. Yeah, if the task running the SD card can be stalled for an arbitrary amount of time, we'll need to make sure we're robust to timeouts and make sure we retry things appropriately. That hasn't been an issue for me at all, because I'm single-tasking and using blocking mode.

@avsaase
Copy link

avsaase commented Apr 5, 2025

I found a solution to the WriteError problem by changing the way commands are sent to the card. Instead of writing the command bytes in one operation and then doing multiple single byte reads until (result & 0x80) == ERROR_OK, we can do both the write and the read in a single transfer. After the transfer is complete we scan the buffer for the response bytes. For the async API this is probably a performance improvement because the DMA can do a single transfer which can be awaited. A small downside is this likely clocks out a few more 0xFF bytes from the card than strictly necessary but I don't think that's a problem.

According to this Stack Overflow answer (which gave me this idea), cards always start sending their responses within 1-8 bytes after the command bytes are sent. This puts an upper bound on the number of bytes in the transfer (19 to be precise).

I have a commit here that demonstrates this approach. I confirmed that this solves the problem shown in the reproducer. The blocking version also still works fine. The way I changed the card_command return type is not super elegant but this was necessary because now all the response bytes are read in one go, so they need to be passed back to the caller in case it needs to use the bytes after the first. This can probably be made more strongly typed but I wanted to get some feedback on this before spending too much time on it.

I can make a PR against this branch or feel free to cherry pick the commit.

@thejpster
Copy link
Member

That SO answer is fantastic.

I am worried about the advice to send a dummy 0xFF byte with CS high in order to clear the output register on buggy cards.

Perhaps we should own two SpiDevice handles - one for the card and one for 'not the card'. Then we can bring the dummy byte clocking at start up back in-house.

@avsaase
Copy link

avsaase commented Apr 6, 2025

It's slightly annoying that the SpiDevice trait doesn't have an escape hatch to manually control the CS line for doing non-standard stuff like this.

Perhaps we should own two SpiDevice handles

I guess that could work but would that require the user to use a physical pin for the the "not the card" CS? That might not always be practical.

bring the dummy byte clocking at start up back in-house.

In embedded-fatffs they have a sd_init function that takes a mutable reference to a SpiBus and a CS pin. This function could be copied as is. Unfortunately this pattern can't be used to send the 0xFF byte with CS high because creating the spi device takes ownership of the CS pin.

@avsaase
Copy link

avsaase commented Apr 6, 2025

I gave the dummy spi device approach a try but I encountered two problems:

  1. With the embassy-rp HAL, and I assume other embassy HALs as well, it's not possible to not use a physical pin for the dummy CS. I tried creating my own dummy pin but somewhere in the trait spaghetti you always hit a sealed trait. Maybe there's another way to solve this.
  2. In the async version, even with a dummy spi device there's no guarantee that the 0xFF byte is written directly after the card spi transfer is complete because there's an await point. Depending on timings, another device might use the bus before the 0xFF is written, negating any benefit. I don't see an obvious solution to this.

@thejpster
Copy link
Member

thejpster commented Apr 6, 2025

In embedded-fatffs they have a sd_init function that takes a mutable reference to a SpiBus and a CS pin.

See #126.

If you take the whole bus, no one else can use it?

Edit: oh, I see. They borrow it and only to send the 74 clock cycles. That seems fine, but can you do that whilst an SpiDevice also exists?

@avsaase
Copy link

avsaase commented Apr 6, 2025

Yeah, that function only serves as a convenient way to send the 74 clock cycles. It doesn't solve the problem that we need to do stuff outside of the SpiDevice contract on every operation. I think the only proper way to do this is to extend embedded-hal{-async} to support custom behavior like this but I'm not holding my breath for that to happen.

Btw, I'm not sure if I've ever seen the lack of 0xFF byte cause a problem. Although, from the explanation in that SO answer it sounds like this could cause strange things to happen. This page from the SdFat C library also says it's important.

@Be-ing
Copy link
Author

Be-ing commented Apr 6, 2025

I am worried about the advice to send a dummy 0xFF byte with CS high in order to clear the output register on buggy cards.

If this is working with multiple SPI devices on a bus without doing that, do we need to bother implementing that?

@Be-ing
Copy link
Author

Be-ing commented Apr 6, 2025

In embedded-fatffs they have a sd_init function that takes a mutable reference to a SpiBus and a CS pin. This function could be copied as is. Unfortunately this pattern can't be used to send the 0xFF byte with CS high because creating the spi device takes ownership of the CS pin.

A nice thing about this API is that it allows the caller to get back a mutable reference to the SPI device to change the clock speed after initialization, which the embedded-hal[-async] traits do not provide a means to do.

Comment on lines -504 to -506
if command == CMD12 {
let _result = self.read_byte()?;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't add this check to the buffer scan. Do we know what the value of this byte is? I can't find anything about it in the spec.

Copy link

@Joe-Lannan Joe-Lannan Apr 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my reading this is not an official spec but it is commonly discarded
i.e. http://elm-chan.org/docs/mmc/mmc_e.html
"The received byte immediataly following CMD12 is a stuff byte, it should be discarded prior to receive the response of the CMD12"
Seems unlikely that it will contain 0x80 and it seems other implementations allow this byte to be check anyway like micropython sdcard.py.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, then this is probably fine. Thanks!

Copy link

@Joe-Lannan Joe-Lannan Apr 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually sorry, that example does show that byte being skipped with self.cmd(12, 0, 0xFF, skip1=True). Perhaps this is important for some cards?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im not sure that when issuing CMD12 it is checking for errors well (here or main). If a transfer is not actually stopped then it may get stuck in block reading unexpectedly and not pass an error. But checking for a valid response from CMD12 may not really be an issue here since it will just read a bunch of bytes that should get the card back to a r1 state anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for asynchronous I/O
7 participants