fix(gmail): RFC 2047 encode non-ASCII display names in address headers#482
fix(gmail): RFC 2047 encode non-ASCII display names in address headers#482
Conversation
🦋 Changeset detectedLatest commit: ec9845a The changes in this PR will be included in the next version bump. This PR includes changesets to release 1 package
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses an issue where non-ASCII display names in email address headers (To, From, Cc, Bcc) were sent as raw UTF-8, leading to garbled text (mojibake) in email clients. The core problem was that only the Subject header was being RFC 2047 encoded, while address headers were only sanitized for CRLF. The solution introduces a dedicated function to correctly encode only the non-ASCII display name portions of these headers, ensuring proper rendering across various email clients. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #482 +/- ##
==========================================
+ Coverage 67.31% 67.57% +0.25%
==========================================
Files 40 40
Lines 17340 17475 +135
==========================================
+ Hits 11673 11808 +135
Misses 5667 5667 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Code Review
This pull request correctly implements RFC 2047 encoding for non-ASCII display names in email address headers. However, the current implementation introduces a critical security vulnerability. The sanitization logic can be bypassed to inject content into address headers (like To, From, Cc), potentially exposing Bcc recipients. I've provided a suggestion to fix this vulnerability in encode_address_header by ensuring the function always reconstructs addresses from parsed components, which strips any injected data.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request correctly addresses an issue with non-ASCII display names in email headers by implementing RFC 2047 encoding. The changes are well-structured and include a comprehensive set of unit tests. However, I've identified a critical security vulnerability in the new encode_address_header function related to header injection in bare email addresses, which needs to be addressed.
Add encode_address_header() that parses mailbox lists, RFC 2047 encodes only the display-name portion of non-ASCII addresses, and leaves email addresses untouched. Applied to all 4 address headers (To, From, Cc, Bcc) in MessageBuilder::build(). Previously, only Subject got RFC 2047 encoding while address headers only got CRLF sanitization, causing mojibake for non-ASCII names. Supersedes #405, #458, #469. Closes #404.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request provides a solid fix for RFC 2047 encoding of non-ASCII display names in email address headers. The new encode_address_header function is well-implemented and the accompanying tests are comprehensive, covering various functional and security-related edge cases.
…lder Replace custom MessageBuilder, RFC 2047 encoding, header sanitization, and address encoding (including googleworkspace#482) with the mail-builder crate (Stalwart Labs, 0 runtime deps). Each command builds a mail_builder::MessageBuilder directly. Introduce structured types throughout: - Mailbox type (parsed display name + email) replaces raw string passing - sanitize_control_chars strips ASCII control characters (CRLF, null, tab, etc.) at the parse boundary — defense-in-depth for mail-builder's structured header types, superseding sanitize_header_value, sanitize_component, and encode_address_header from googleworkspace#482 - OriginalMessage fields use Option<T> instead of empty-string sentinels - parse_original_message returns Result with validation (threadId, From, Message-ID) - Pre-parsed Config types (SendConfig, ForwardConfig, ReplyConfig) with Vec<Mailbox> — parse at the boundary, not downstream - parse_forward_args and parse_send_args return Result with --to validation, consistent with parse_reply_args - parse_optional_mailboxes helper normalizes Some(vec![]) to None for optional address fields (--cc, --bcc, --from) - Envelope types borrow from Config + OriginalMessage with lifetimes - Message IDs stored bare (no angle brackets), parsed once at boundary - References stored as Vec<String> instead of space-separated string - ThreadingHeaders bundles In-Reply-To + References with debug_assert for bare-ID convention - Shared CLI arg builders (common_mail_args, common_reply_args) eliminate duplicated --cc/--bcc/--html/--dry-run definitions Additional improvements: - finalize_message returns Result instead of panicking via .expect() - Mailbox::parse_list filters empty-email entries (trailing comma edge case) - format_email_link percent-encodes mailto hrefs to prevent parameter injection - Forward date handling: omits Date line when absent instead of showing empty "Date: " - Dry-run auth: log skipped auth as diagnostic instead of silently discarding errors - Restore --html tips in after_help strings (gmail_quote CSS, cid: image warnings, HTML fragment advice) lost in release PR googleworkspace#434 - Update execute_method call for upload_content_type parameter (googleworkspace#429) Delete: MessageBuilder, encode_header_value, sanitize_header_value, encode_address_header, sanitize_component, extract_email, extract_display_name, split_mailbox_list, build_references.
Summary
Non-ASCII display names in To, From, Cc, and Bcc headers were sent as raw UTF-8, causing mojibake in email clients (e.g. Japanese
下野祐太appeared garbled, SpanishJosé Garcíadisplayed incorrectly).Root Cause
MessageBuilder::build()applied RFC 2047 encoding (encode_header_value()) only to the Subject header. Address headers (To, From, Cc, Bcc) only got CRLF sanitization viasanitize_header_value(), leaving non-ASCII bytes unencoded.Fix
New
encode_address_header()function that:split_mailbox_list()MessageBuilder::build()Example
Supersedes
Tests
cargo test— 623 passed, 0 failedcargo clippy -- -D warnings— cleanencode_address_header()MessageBuilder