Skip to content

fix(gmail): RFC 2047 encode non-ASCII display names in address headers#482

Merged
jpoehnelt merged 1 commit intomainfrom
fix/rfc-2047
Mar 13, 2026
Merged

fix(gmail): RFC 2047 encode non-ASCII display names in address headers#482
jpoehnelt merged 1 commit intomainfrom
fix/rfc-2047

Conversation

@jpoehnelt
Copy link
Member

Summary

Non-ASCII display names in To, From, Cc, and Bcc headers were sent as raw UTF-8, causing mojibake in email clients (e.g. Japanese 下野祐太 appeared garbled, Spanish José García displayed incorrectly).

Root Cause

MessageBuilder::build() applied RFC 2047 encoding (encode_header_value()) only to the Subject header. Address headers (To, From, Cc, Bcc) only got CRLF sanitization via sanitize_header_value(), leaving non-ASCII bytes unencoded.

Fix

New encode_address_header() function that:

  1. Parses a comma-separated mailbox list using existing split_mailbox_list()
  2. For each mailbox, extracts display name and email using existing helpers
  3. RFC 2047 Base64-encodes only non-ASCII display names
  4. Leaves email addresses and ASCII display names untouched
  5. Applied to all 4 address headers in MessageBuilder::build()

Example

Before: From: José García <jose@example.com>  → mojibake
After:  From: =?UTF-8?B?Sm9zw6kgR2FyY8OtYQ==?= <jose@example.com>  → correct

Supersedes

Tests

  • cargo test — 623 passed, 0 failed
  • cargo clippy -- -D warnings — clean
  • 5 new unit tests for encode_address_header()
  • 1 integration test verifying all 4 headers in MessageBuilder

@changeset-bot
Copy link

changeset-bot bot commented Mar 13, 2026

🦋 Changeset detected

Latest commit: ec9845a

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
@googleworkspace/cli Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@googleworkspace-bot googleworkspace-bot added the area: core Core CLI parsing, commands, error handling, utilities label Mar 13, 2026
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses an issue where non-ASCII display names in email address headers (To, From, Cc, Bcc) were sent as raw UTF-8, leading to garbled text (mojibake) in email clients. The core problem was that only the Subject header was being RFC 2047 encoded, while address headers were only sanitized for CRLF. The solution introduces a dedicated function to correctly encode only the non-ASCII display name portions of these headers, ensuring proper rendering across various email clients.

Highlights

  • Non-ASCII Display Name Encoding: Implemented RFC 2047 encoding for non-ASCII display names in email address headers (To, From, Cc, Bcc) to prevent mojibake.
  • New encode_address_header Function: Introduced a new helper function encode_address_header responsible for parsing mailbox lists, identifying non-ASCII display names, and applying RFC 2047 Base64 encoding only to those names, while leaving email addresses and ASCII display names untouched.
  • MessageBuilder Integration: Integrated the new encode_address_header function into MessageBuilder::build() to ensure all relevant address headers (To, From, Cc, Bcc) are correctly encoded before email transmission.
Changelog
  • .changeset/rfc-2047-address-headers.md
    • Added a new changeset file documenting the RFC 2047 encoding fix for address headers.
  • src/helpers/gmail/mod.rs
    • Introduced encode_address_header function to handle RFC 2047 encoding for non-ASCII display names in email addresses.
    • Modified MessageBuilder::build to apply encode_address_header to the 'To', 'From', 'Cc', and 'Bcc' headers.
    • Added five new unit tests for encode_address_header covering various scenarios (bare email, ASCII, non-ASCII, mixed, quoted non-ASCII).
    • Added one integration test for MessageBuilder to verify correct encoding across all four address headers.
Activity
  • cargo test passed with 623 tests.
  • cargo clippy -- -D warnings ran clean.
  • 5 new unit tests were added for the encode_address_header() function.
  • 1 integration test was added to verify all four address headers in MessageBuilder.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Generative AI Prohibited Use Policy, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@googleworkspace-bot googleworkspace-bot added the cla: yes This human has signed the Contributor License Agreement. label Mar 13, 2026
@codecov
Copy link

codecov bot commented Mar 13, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 67.57%. Comparing base (47afe5f) to head (ec9845a).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #482      +/-   ##
==========================================
+ Coverage   67.31%   67.57%   +0.25%     
==========================================
  Files          40       40              
  Lines       17340    17475     +135     
==========================================
+ Hits        11673    11808     +135     
  Misses       5667     5667              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly implements RFC 2047 encoding for non-ASCII display names in email address headers. However, the current implementation introduces a critical security vulnerability. The sanitization logic can be bypassed to inject content into address headers (like To, From, Cc), potentially exposing Bcc recipients. I've provided a suggestion to fix this vulnerability in encode_address_header by ensuring the function always reconstructs addresses from parsed components, which strips any injected data.

@github-actions github-actions bot added the gemini: reviewed Gemini Code Assist has reviewed the latest changes label Mar 13, 2026
@github-actions github-actions bot removed the gemini: reviewed Gemini Code Assist has reviewed the latest changes label Mar 13, 2026
@googleworkspace-bot
Copy link
Collaborator

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly addresses an issue with non-ASCII display names in email headers by implementing RFC 2047 encoding. The changes are well-structured and include a comprehensive set of unit tests. However, I've identified a critical security vulnerability in the new encode_address_header function related to header injection in bare email addresses, which needs to be addressed.

@github-actions github-actions bot added the gemini: reviewed Gemini Code Assist has reviewed the latest changes label Mar 13, 2026
Add encode_address_header() that parses mailbox lists, RFC 2047
encodes only the display-name portion of non-ASCII addresses, and
leaves email addresses untouched. Applied to all 4 address headers
(To, From, Cc, Bcc) in MessageBuilder::build().

Previously, only Subject got RFC 2047 encoding while address headers
only got CRLF sanitization, causing mojibake for non-ASCII names.

Supersedes #405, #458, #469. Closes #404.
@github-actions github-actions bot removed the gemini: reviewed Gemini Code Assist has reviewed the latest changes label Mar 13, 2026
@googleworkspace-bot
Copy link
Collaborator

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request provides a solid fix for RFC 2047 encoding of non-ASCII display names in email address headers. The new encode_address_header function is well-implemented and the accompanying tests are comprehensive, covering various functional and security-related edge cases.

@github-actions github-actions bot added the gemini: reviewed Gemini Code Assist has reviewed the latest changes label Mar 13, 2026
@jpoehnelt jpoehnelt merged commit c61b9cb into main Mar 13, 2026
34 checks passed
@jpoehnelt jpoehnelt deleted the fix/rfc-2047 branch March 13, 2026 23:21
malob added a commit to malob/cli that referenced this pull request Mar 15, 2026
…lder

Replace custom MessageBuilder, RFC 2047 encoding, header sanitization,
and address encoding (including googleworkspace#482) with the mail-builder crate
(Stalwart Labs, 0 runtime deps). Each command builds a
mail_builder::MessageBuilder directly.

Introduce structured types throughout:
- Mailbox type (parsed display name + email) replaces raw string passing
- sanitize_control_chars strips ASCII control characters (CRLF, null,
  tab, etc.) at the parse boundary — defense-in-depth for mail-builder's
  structured header types, superseding sanitize_header_value,
  sanitize_component, and encode_address_header from googleworkspace#482
- OriginalMessage fields use Option<T> instead of empty-string sentinels
- parse_original_message returns Result with validation (threadId, From,
  Message-ID)
- Pre-parsed Config types (SendConfig, ForwardConfig, ReplyConfig) with
  Vec<Mailbox> — parse at the boundary, not downstream
- parse_forward_args and parse_send_args return Result with --to
  validation, consistent with parse_reply_args
- parse_optional_mailboxes helper normalizes Some(vec![]) to None for
  optional address fields (--cc, --bcc, --from)
- Envelope types borrow from Config + OriginalMessage with lifetimes
- Message IDs stored bare (no angle brackets), parsed once at boundary
- References stored as Vec<String> instead of space-separated string
- ThreadingHeaders bundles In-Reply-To + References with debug_assert
  for bare-ID convention
- Shared CLI arg builders (common_mail_args, common_reply_args)
  eliminate duplicated --cc/--bcc/--html/--dry-run definitions

Additional improvements:
- finalize_message returns Result instead of panicking via .expect()
- Mailbox::parse_list filters empty-email entries (trailing comma edge
  case)
- format_email_link percent-encodes mailto hrefs to prevent parameter
  injection
- Forward date handling: omits Date line when absent instead of showing
  empty "Date: "
- Dry-run auth: log skipped auth as diagnostic instead of silently
  discarding errors
- Restore --html tips in after_help strings (gmail_quote CSS, cid:
  image warnings, HTML fragment advice) lost in release PR googleworkspace#434
- Update execute_method call for upload_content_type parameter (googleworkspace#429)

Delete: MessageBuilder, encode_header_value, sanitize_header_value,
encode_address_header, sanitize_component, extract_email,
extract_display_name, split_mailbox_list, build_references.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: core Core CLI parsing, commands, error handling, utilities cla: yes This human has signed the Contributor License Agreement. gemini: reviewed Gemini Code Assist has reviewed the latest changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Gmail draft update: non-ASCII CC/From/BCC headers not RFC 2047 encoded (mojibake)

3 participants