-
Notifications
You must be signed in to change notification settings - Fork 1.9k
fix: add UTF-8 BOM handling to file reading operations #5948
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Created bomUtils module with functions to detect and strip UTF-8 BOM - Updated extractTextFromFile to strip BOM from text files - Updated readLines to strip BOM from the first chunk when streaming - Added comprehensive tests for BOM handling - Updated readFileTool to treat files with only BOM as empty - Ensures Windows files with BOM are handled correctly Addresses the concern raised in #5789 about UTF-8 BOM causing issues on Windows
@roomote-agent The Please update this PR (do not create a new PR, commit directly to this PR #5948) to:
Commit these changes to the existing branch |
Thank you for the review @daniel-lxs! You are absolutely right - we should use the existing |
- Remove src/utils/bomUtils.ts and its test file - Update imports in extract-text.ts to use strip-bom package - Update imports in read-lines.ts to use strip-bom package - Keep the empty file detection logic in readFileTool.ts unchanged This eliminates code duplication by using the existing strip-bom package that is already used in DiffViewProvider.ts and CustomModesManager.ts.
Hi @daniel-lxs, I have successfully updated PR #5948 as requested: ✅ Changes made:
✅ All CI checks have passed:
The PR now uses the existing |
Summary
This PR adds proper UTF-8 BOM (Byte Order Mark) handling to the file reading operations in Roo Code. This addresses the concern raised by @pwilkin in issue #5789 about potential UTF-8 BOM issues on Windows.
Problem
Windows applications often add a UTF-8 BOM (3-byte sequence: 0xEF, 0xBB, 0xBF) to the beginning of UTF-8 files. Without proper handling, this can cause:
Solution
Added BOM stripping using the
strip-bom
package (v5.0.0):extract-text.ts
: Strips BOM from text files after readingread-lines.ts
: Strips BOM from the first chunk when streaming filesUpdated empty file detection in
readFileTool.ts
:Added comprehensive tests:
Impact on AI Model
This PR does NOT change the content sent to the AI model for files with actual content. The changes only:
For non-empty files, the AI receives the exact same content as before, just without the invisible BOM prefix.
Testing
Related Issues
Implementation Details
The BOM stripping happens transparently during file reading, so:
cc @daniel-lxs @pwilkin