Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Nov 1, 2025

Fixed multi-byte encodings (Encoding.Unicode, Encoding.UTF32) cause request smuggling vulnerabilities when used with HTTP header encoding selectors—their binary representation can inject protocol control characters (e.g., contains bytes interpreted as \n).

Changes

Added detailed remarks to RequestHeaderEncodingSelector and ResponseHeaderEncodingSelector documenting:

  • Safe encodings: Encoding.ASCII, Encoding.Latin1, or Encoding.UTF8
  • CAUTION callout: Fixed multi-byte encodings must never be used due to binary misinterpretation breaking HTTP protocol
  • Concrete example: How emoji bytes can be interpreted as newlines terminating header values
  • Caller responsibility: Ensuring encoding compatibility with server to prevent silent corruption

Example

// ❌ DANGEROUS - Unicode bytes can break protocol
handler.RequestHeaderEncodingSelector = (name, request) => Encoding.Unicode;

// ✅ SAFE - Use ASCII, Latin1, or UTF8
handler.RequestHeaderEncodingSelector = (name, request) => Encoding.UTF8;

The documentation warns that mismatched encodings between client and server can cause silent data corruption (e.g., Latin1 client with UTF-8 server).

Original prompt

This section details on the original issue you should resolve

<issue_title>Document that fixed multi-byte encodings aren't appropriate for HTTP header encoding selectors</issue_title>
<issue_description>Users are generally expected to use one of the following encoding schemes: Encoding.ASCII, Encoding.Latin1, or Encoding.UTF8.
Other encoding schemes may be susceptible to request smuggling attacks. Fixed multi-byte schemes in particular (e.g. Encoding.Unicode, Encoding.UTF32) should never be used as their binary representation is likely to be misinterpreted and break the protocol (e.g. part of is interpreted as a new line \n that terminates the header value).

When specifying an encoding, it is the caller's responsibility to ensure that the values are representable via said encoding, and that the server agrees on the encoding used. Otherwise, values will become corrupted.
For example, since any set of bytes is considered a valid Latin1 sequence, a server replying with UTF-8 encoded header values may see its responses silently corrupted.</issue_description>

<agent_instructions>Document that such encodings must not be used with SocketsHttpHandler.RequestHeaderEncodingSelector and SocketsHttpHandler.ResponseHeaderEncodingSelector.</agent_instructions>

Comments on the Issue (you are @copilot in this section)


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Copilot AI changed the title [WIP] Document fixed multi-byte encodings and HTTP header safety Document security risks of fixed multi-byte encodings in HTTP header encoding selectors Nov 1, 2025
Copilot AI requested a review from MihaZupan November 1, 2025 06:05
Copilot finished work on behalf of MihaZupan November 1, 2025 06:05
@MihaZupan MihaZupan marked this pull request as ready for review November 1, 2025 06:07
@MihaZupan MihaZupan requested a review from a team as a code owner November 1, 2025 06:07
Copilot AI review requested due to automatic review settings November 1, 2025 06:07
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds comprehensive documentation for the RequestHeaderEncodingSelector and ResponseHeaderEncodingSelector properties in the SocketsHttpHandler class. The documentation provides guidance on safe encoding schemes and warns against security vulnerabilities related to encoding choices.

Key Changes:

  • Added detailed remarks sections for both request and response header encoding selector properties
  • Documented recommended encoding schemes (ASCII, Latin1, UTF8) and security implications of other schemes
  • Included caution about fixed multi-byte encoding schemes that could break protocol integrity

@MihaZupan MihaZupan merged commit 8e994a2 into main Nov 3, 2025
15 checks passed
@MihaZupan MihaZupan deleted the copilot/fix-http-header-encoding branch November 3, 2025 13:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Document that fixed multi-byte encodings aren't appropriate for HTTP header encoding selectors

3 participants