Skip to content

Use RFC-compliant HeaderValue type. #287

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

PiotrSikora
Copy link
Member

When "header-value" feature is enabled, a RFC-compliant HeaderValue
type is used instead of the previously used UTF-8 String type.

Since HTTP field values aren't UTF-8 encoded, using the String type
meant that a plugin would crash when valid, but obsolete, non-UTF-8
characters were present in HTTP headers and/or trailers.

This feature is currently optional to avoid breaking changes and to
help with migration, but it will become a default feature in v0.3.

The HeaderValue type is re-exported from the http crate.

When "header-value" feature is enabled, a RFC-compliant HeaderValue
type is used instead of the previously used UTF-8 String type.

Since HTTP field values aren't UTF-8 encoded, using the String type
meant that a plugin would crash when valid, but obsolete, non-UTF-8
characters were present in HTTP headers and/or trailers.

This feature is currently optional to avoid breaking changes and to
help with migration, but it will become a default feature in v0.3.

The HeaderValue type is re-exported from the http crate.

Signed-off-by: Piotr Sikora <[email protected]>
@PiotrSikora PiotrSikora marked this pull request as ready for review October 28, 2024 22:18
@PiotrSikora PiotrSikora mentioned this pull request Oct 28, 2024
src/hostcalls.rs Outdated
@@ -1159,46 +1193,59 @@ mod utils {
bytes
}

pub(super) fn serialize_map(map: Vec<(&str, &str)>) -> Bytes {
pub(super) fn serialize_map<V>(map: Vec<(&str, V)>) -> Bytes
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to myself: drop Bytes if merged after #286.

This crate supports the following optional features:

- `header-value` - uses RFC-compliant `HeaderValue` instead of UTF-8 `String` for HTTP header and trailer values.
This will become the default in future releases.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this PR doesn't introduce any changes or restrictions on header names in the SDK, since all valid HTTP field names are also valid UTF-8 strings, and proxies are very strict about this, so we never panic deserializing header names.

Having said that, I'm tempted to use HeaderName type while we're here, and turn this into a full typed variant.

Unfortunately, because we're passing HTTP/2 pseudo-headers (:authority, :status, etc.) in the header map, those names are non-compliant with RFC HTTP field names, so we'd need to either:

  1. create wrapper around http::HeaderName type,
  2. create our own more forgiving type,
  3. return a pair of maps, one with pseudo-headers and one with HTTP headers.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unfortunate that http::HeaderName can't accomodate pseudo-headers, since that's a very common case. What does other Rust code do for handling H2 pseudo-headers, just use strings?

If there isn't any good external precedent, maybe (1), in the form of an enum type that can either be an http::HeaderName or a pseudo-header? Or would http::HeaderName plus a pseudoheader bool be sufficient to represent all cases?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unfortunate that http::HeaderName can't accomodate pseudo-headers, since that's a very common case. What does other Rust code do for handling H2 pseudo-headers, just use strings?

Pseudo-headers are a wire-format detail, and they are not intermixed with HTTP headers in any library or language that I'm aware of (other than Envoy), so that's not an issue that other people run into.

If there isn't any good external precedent, maybe (1), in the form of an enum type that can either be an http::HeaderName or a pseudo-header? Or would http::HeaderName plus a pseudoheader bool be sufficient to represent all cases?

http::HeaderName cannot contain : character, so a pseudoheader flag wouldn't work.

My other concern was that even if we "patch" http::HeaderName somehow, we still wouldn't be able to use http::HeaderMap that uses original http::HeaderName as long as pseudo-headers are intermixed with the regular HTTP headers.

I think we should separate them, at the very least at the SDK level, otherwise we're just propagating this Envoy-ism and making it hard-to-impossible to interoperate with the generic HTTP libraries available in various languages.

I'm sure this isn't helpful in Go SDK either.

log = "0.4"

[features]
default = []
header-value = ["dep:bytes", "dep:http"]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feature could be renamed to valid-header, strict-header or rfc-header to make this reusable in case we want to add restrictions on header name in the future.

Alternatively, we could drop this as a feature and instead add those functions as variants with _typed() suffix.

Thoughts?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_typed() suffix would fit alongside the existing _bytes() suffix, so doesn't seem too bad to me.

No objection to keeping it as a feature, though, assuming we find a solution to the HTTP/2 pseudo-headers problem. If you keep it a feature, strict-header or rfc-header SGTM.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_typed() suffix would fit alongside the existing _bytes() suffix, so doesn't seem too bad to me.

Since the original version without suffix is fundamentally broken for the HTTP use case, I think we should drop it sooner rather than later, so I'll stick to the version from the existing revision of this PR and leave it without suffix behind a feature flag.

No objection to keeping it as a feature, though, assuming we find a solution to the HTTP/2 pseudo-headers problem.

Pseudo-header issue is orthogonal to this, since it only affects HTTP header names, and not values (this PR).

If you keep it a feature, strict-header or rfc-header SGTM.

My original thinking was that we need to have a flag for both: values only (this PR) and for both, so that we could add support for http::HeaderName and http::HeaderMap in a separate PR without making it a breaking change.

Having said that, I suspect that adding support for the latter is going to require splitting pseudo-headers and regular HTTP headers into separate maps, so it might be a much bigger change that'd be done as part of a bigger rewrite and not something simply hidden behind a feature flag.

Using http::HeaderValue is much more important, since the UTF-8 issue is causing panics and is pretty much broken.

Using http::HeaderName and http::HeaderMap only gives us better integration with existing Rust's HTTP ecosystem, but it doesn't really fix any issues, so it's a nice-to-have.

So strict-header-value (this PR) and strict-header (once we have support for both)?

}

#[cfg(feature = "header-value")]
pub(super) fn deserialize_map(mut bytes: bytes::Bytes) -> Vec<(String, HeaderValue)> {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: drop bytes:: prefix if merged after #286.

@PiotrSikora
Copy link
Member Author

@mpwarres could we get someone from Google to convert Rust tests in Envoy to use the header-value feature (it's already exposed via @proxy_wasm_rust_sdk//:proxy_wasm_header_value Bazel target), to verify that I didn't miss anything obvious, before we merge this PR? I converted some tests when I originally wrote this PR, which was a while ago, so the basics do work, but it wasn't a rewrite of everything.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants