fix: prevent file corruption when ID3 bytes appear in audio data #185

realies · 2025-07-10T16:43:29Z

Summary

This PR fixes a critical bug that can cause severe file corruption when using removeTags() followed by write() on MP3 files that contain the bytes "ID3" in their audio data.

The Problem

The current implementation of getFramePosition() searches for "ID3" patterns throughout the entire buffer, not just at the beginning where ID3v2 tags are supposed to be. This can lead to catastrophic data loss when:

An MP3 file's audio data happens to contain bytes that spell "ID3"
These bytes accidentally pass the ID3 header validation
The library mistakes this for a real ID3 tag and removes large chunks of audio data

Real-world impact:

File size reduced from 82 MB to 15.1 MB (81.5% data loss!)
~54 minute file down to ~15 minute file
Data permanently lost

The Solution

This PR modifies getFramePosition() to only look for ID3 tags at position 0 (the beginning of the buffer), which aligns with the ID3v2 specification that mandates ID3v2 tags MUST be at the beginning of files.

Changes:

// Before: Searched entire buffer for "ID3"
do {
    framePosition = buffer.indexOf("ID3", framePosition + 1)
    // ... validation
} while (framePosition !== -1 && !frameHeaderValid)

// After: Only check position 0
if (buffer.length >= 3 && buffer.slice(0, 3).toString() === "ID3") {
    framePosition = 0
    frameHeaderValid = this.isValidID3Header(buffer.slice(0, 10))
}

Testing

Added comprehensive test cases in test/test.js under the #removeTagsFromBuffer() section that:

Verify files with "ID3" in audio data are not corrupted
Ensure ID3 tags are only detected at position 0
Confirm normal ID3v2 tag reading still works correctly

All 74 tests pass, including the new corruption prevention tests.

Compatibility

This fix maintains 100% compatibility with all ID3v2 versions:

✅ ID3v2.2
✅ ID3v2.3
✅ ID3v2.4

ID3v1 is unaffected as node-id3 does not support ID3v1 tags (they use "TAG" not "ID3").

Root Cause Analysis

The bug occurs because:

removeTags() correctly removes ID3 tags from the file
write() reads the file and calls removeTagsFromBuffer() again
This second call finds "ID3" bytes in the MP3 audio data
If these bytes pass validation, massive chunks of audio are removed

Example from affected file:

Position 15,873,532 contained: 49 44 33 04 00 3d 30 32 01 33
This passed validation as an ID3v4 tag claiming to be 96.8 MB
Result: 66.8 MB of audio data deleted

References

ID3v2.3 specification: http://id3.org/id3v2.3.0
ID3v2.4 specification: http://id3.org/id3v2.4.0-structure

Both specifications clearly state that ID3v2 tags are prepended to files (i.e., at position 0).

Breaking Changes

None. This is a bug fix that prevents incorrect behavior while maintaining all existing functionality.

Zazama · 2025-07-10T17:26:38Z

ID3v2 specification that mandates ID3v2 tags MUST be at the beginning of files.

Sadly, this is not correct and there's a section in your linked documents describing that ID3v2.4 headers don't have to be at the beginning of a file (Section 5).

The default location of an ID3v2 tag is prepended to the audio so
that players can benefit from the information when the data is
streamed. It is however possible to append the tag, or make a
prepend/append combination.

Even the 2.3 document doesn't clearly say it has to be at the beginning, it only states it should be.

The ID3v2 tag header, which should be the first information in the file, is 10 bytes as follows

Do you have any example mp3 where that's a problem? Because the tag is not just found when an "ID3" string exists in the file, it also has to be a valid tag header, which should be very unlikely in an mpeg data stream.

realies · 2025-07-11T16:18:35Z

Thank you for the review! You're absolutely correct about the ID3v2 specification - I apologize for misreading it. ID3v2.4 does indeed allow appended tags (Section 5), and ID3v2.3 uses "should" not "must" for prepending.

However, we do have a concrete example where this bug caused severe corruption:

Real-world Example

In our production system, an MP3 file was corrupted from 82 MB to 15.1 MB when using removeTags() followed by write(). The problematic bytes at position 15,873,532 were:

49 44 33 04 00 3d 30 32 01 33

These bytes passed all validation in isValidID3Header():

49 44 33 = "ID3" ✓
04 = ID3v2.4 ✓
00 = flags byte ✓
3d 30 32 01 = valid size bytes (all have bit 7 unset) ✓
Decoded size: ~96.8 MB

Result: 66.8 MB of audio data was deleted from the file.

The Current Implementation Issue

Looking at the codebase, node-id3 doesn't actually implement proper support for appended tags:

getFramePosition() searches the entire buffer for "ID3" - this isn't checking for appended tags, it's just a brute force search
There's no code to check the end of the file for appended tags
There's no code to handle the footer that ID3v2.4 requires for appended tags

So while the spec allows appended tags, the current implementation doesn't properly support them anyway. It only works correctly with prepended tags.

Proposed Solutions

I see a few options:

Option 1: Keep the current fix (my preference)

Only check position 0 for ID3 tags
This prevents the corruption bug
Maintains compatibility with how node-id3 actually works (prepended tags only)
Simple and safe

Option 2: Properly implement appended tag support

Check position 0 for prepended tags
Check the end of the file for appended tags (with footer detection for v2.4)
More complex but fully spec-compliant

Option 3: Add a safety check

Keep searching the buffer but add validation that the detected position makes sense
E.g., if we find "ID3" at position 15MB claiming to be 96MB, that's clearly wrong
Risk: Still vulnerable to edge cases

What approach would you prefer? I'm happy to implement any of these solutions. The current PR implements Option 1 as the simplest fix that prevents data corruption while maintaining current functionality.

fix: prevent file corruption when ID3 bytes appear in audio data

d474432

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: prevent file corruption when ID3 bytes appear in audio data #185

fix: prevent file corruption when ID3 bytes appear in audio data #185

Uh oh!

realies commented Jul 10, 2025

Uh oh!

Zazama commented Jul 10, 2025

Uh oh!

realies commented Jul 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix: prevent file corruption when ID3 bytes appear in audio data #185

Are you sure you want to change the base?

fix: prevent file corruption when ID3 bytes appear in audio data #185

Uh oh!

Conversation

realies commented Jul 10, 2025

Summary

The Problem

Real-world impact:

The Solution

Changes:

Testing

Compatibility

Root Cause Analysis

References

Breaking Changes

Uh oh!

Zazama commented Jul 10, 2025

Uh oh!

realies commented Jul 11, 2025

Real-world Example

The Current Implementation Issue

Proposed Solutions

Option 1: Keep the current fix (my preference)

Option 2: Properly implement appended tag support

Option 3: Add a safety check

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants