Skip to content

Uplift Utf8Bytes #783

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Conversation

coolreader18
Copy link
Contributor

@coolreader18 coolreader18 commented Apr 12, 2025

This adds a Bytes wrapper with a UTF-8 validity invariant, uplifting the type found as:

  • bytes_utils::Str (2.4M downloads/month)
  • bytestring::ByteString (1.7M downloads/month)
  • tungstenite::Utf8Bytes
  • pub(crate) http::ByteStr
  • axum::extract::ws::Utf8Bytes (itself a wrapper around tungstenite::Utf8Bytes, so as to not depend on tungstenite in the public API)
  • pub(crate) h2::hpack::ByteStr
  • async_nats::Subject
  • and probably elsewhere.

My approach here was to just copy all the API surface from Bytes that could be adapted (as well as a function based on String::from_utf8()), but if this is too big to merge all at once, I could take out some of the methods and add them in a followup PR. I didn't directly copy anything from any of the aforementioned crates, in case that matters wrt licensing, though I did look a little bit at how bytestring phrased their docs.

The Hash impl delegates to str as Hash rather than Bytes as Hash, meaning it's not hash-compatible with Bytes. This differs from http::ByteStr and tungstenite::Utf8Bytes, but it's required, since Utf8Bytes implements Borrow<str> (and so does tungstenite::Utf8Bytes, which means that was probably an oversight).

wrto the name: Utf8Bytes seemed to me the most straightforward description of what it is, and the axum case is what spurred me to open this. ByteString seems like it could imply that std::string::String isn't composed of bytes (or that it's something like bstr; not necessarily utf8), and BytesString is a bit clunky with the geminated s. Str is simple like Bytes is, but doesn't match up with OsStr or CStr, which are slices and not owned buffers. (Edit: another option is just String, with the idea that it would primarily be referred to as bytes::String. That's probably a bad idea though). However, I'm not dead set on anything, and would be totally fine with renaming it if that's desired.

@paolobarbolini
Copy link
Contributor

paolobarbolini commented Apr 16, 2025

FWIW I think this is a good idea. Other than tungstenite and http, which have already been mentioned above, I'll list a few more crates that have their own internal ByteStr type or equivalent to drive home the point:

  1. h2
  2. async-nats

Copy link
Contributor

@paolobarbolini paolobarbolini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left a small nitpick.

@coolreader18
Copy link
Contributor Author

I'm not sure what the procedure is here, but would it be possible to get another review on this?

@Darksonn
Copy link
Contributor

I think it would be good for bytes to provide this, but we need to get the API perfectly right on first try when adding it to the bytes crate as breaking changes are impossible. It may be better to start out by placing this in a downstream "blessed" crate and moving it into bytes later.

@coolreader18
Copy link
Contributor Author

I do think that seems reasonable, but also, almost the whole API proposed here is basically a subset of that of Bytes withs/[u8]/str, plus a few things based on String (from_utf8, from_utf8_unchecked, as_str, into_bytes, TryFrom<Bytes>). I was intentionally trying to innovate as little as possible with this, with the idea that these functions and impls are well established already by either Bytes or std. I also think bytes_utils and bytestring have kinda already functioned as that testing ground - certainly perhaps more data could be collected about what needs improved regarding the API, but I don't think a full nother crate needs to be released for that. As I said, I'm happy to remove some API surface for an initial impl, but I think even just having something with Clone, Deref, from_utf8_unchecked and into_bytes would help the ecosystem start to unify on a single type.

@Darksonn
Copy link
Contributor

I could buy that argument, but you are innovating in the name of the struct. ;)

@seanmonstar
Copy link
Member

To add a counter point, I have "needed" such a struct in several crates, but it hasn't bothered me at all to include the tiny amount of code to make a private struct.

So I haven't felt a need for a unified type in the ecosystem.

@Darksonn
Copy link
Contributor

I mean, I can see the point that downstream crates would like to avoid using unsafe, which you need to make a nice BytesString wrapper.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants