Skip to content

Conversation

@x15sr71
Copy link
Contributor

@x15sr71 x15sr71 commented Nov 28, 2025

In raising this pull request, I confirm the following (please check boxes):

  • I have read and understood the contributors guide.
  • I have checked that another pull request for this purpose does not exist.
  • I have considered, and confirmed that this submission will be valuable to others.
  • I accept that this submission may not be used, and the pull request closed at the will of the maintainer.
  • I give this submission freely, and claim no ownership to its content.
  • I have mentioned this change in the changelog.

My familiarity with the project is as follows (check one):

  • I have never used CCExtractor.
  • I have used CCExtractor just a couple of times.
  • I absolutely love CCExtractor, but have not contributed previously.
  • I am an active contributor to CCExtractor.

Description

Fixes #1759 - This PR restores functional XMLTV generation for ATSC broadcast streams and adds comprehensive EPG parsing capabilities. ATSC streams with EIT/VCT/ETT tables now generate complete XMLTV output with program titles, descriptions, and extended text metadata.

Problem

The -xmltv parameter was completely non-functional for ATSC broadcast streams. When processing ATSC transport streams containing valid EPG data (EIT tables), channel information (VCT/TVCT tables), and extended text (ETT tables), CCExtractor would:

  • Generate SRT caption files (working correctly)
  • NOT generate XMLTV files (the bug)
  • Ignore extended program descriptions from ETT tables
  • Drop events due to buffer boundary check errors

This made it impossible to extract Electronic Program Guide data from ATSC streams, despite the -xmltv parameter being specified.

Root causes identified:

  1. EPG events stored in fallback storage (TS_PMT_MAP_SIZE) were never output to XMLTV
  2. Inverted buffer boundary check logic (CHECK_OFFSET macro) caused parser failures and potential buffer overruns
  3. Limited ATSC table ID support (missing extended EIT tables, Cable VCT, and ETT tables)
  4. ATSC multiple_string parser incorrectly combined title and description into single field
  5. No support for ETT (Extended Text Table) parsing, losing detailed program information

Solution

Core Fixes

  1. Fixed EPG output logic (EPG_output() function)

    • Modified to always check fallback storage regardless of nb_program value
    • ATSC streams store events in fallback due to VCT source ID mapping, but these were being ignored
    • Now correctly outputs events from both program-mapped storage and fallback storage
    • Ensures ATSC VCT-defined channels generate XMLTV output
  2. Fixed critical buffer boundary check (CHECK_OFFSET macro)

    • Corrected inverted logic from < to > in boundary validation
    • Before: if (offset + val < offset_end) (incorrect - allowed overruns)
    • After: if (offset + (val) > offset_end) (correct - prevents overruns)
    • Applied consistently across EIT, VCT, and ETT parsing functions
    • Prevents crashes and incomplete parsing
  3. Extended ATSC table support (EPG_parse_table() function)

    • Added extended EIT table IDs: 0xCD, 0xCE, 0xCF, 0xD0 (in addition to 0xCB)
    • Added Cable VCT variant: 0xC9 (in addition to Terrestrial VCT 0xC8)
    • New: Added ETT (Extended Text Table) support: 0xCC
    • Ensures comprehensive ATSC EPG data extraction

New Features

  1. Implemented ATSC ETT (Extended Text Table) parsing

    • Added EPG_ATSC_decode_ETT() function to parse ETT table structures
    • Added EPG_ATSC_decode_ETT_text() to extract multiple string format extended descriptions
    • ETT data now populates <desc> tags in XMLTV output with detailed program information
    • Matches ETT extended text to events by source_id (service_id)
    • Supports multi-segment, multi-language text extraction
  2. Enhanced ATSC multiple_string decoder (EPG_ATSC_decode_multiple_string())

    • Fixed to properly separate title (segment 0) and description (segment 1)
    • Before: Both segments written to same field, causing data loss
    • After: First segment → event_name (title), second segment → text (subtitle/description)
    • Added proper memory management and bounds checking
    • Only processes uncompressed ANSI strings (compression_type==0x00, mode==0x00)
  3. Improved XMLTV output formatting

    • Added proper indentation and line breaks for readability
    • ETT extended text now appears in <desc> tags (correct XMLTV placement)
    • Fixed empty subtitle handling (only output when text exists)

Testing

Tested with sample files provided by @TPeterson94070 in issue #1759:

  • channel5FullTS.ts - 5 channels with VCT/TVCT tables
  • ch12FullTS.ts - Additional ATSC test case
  • ch29FullTS.ts - 5 programs with extended EIT data (Nov 26-28, 2025)

Before this PR:

./ccextractor channel5FullTS.ts --xmltv 1

  • Output: Only .srt file generated
  • No XMLTV file created (bug)
  • ETT data completely ignored

After this PR:

./ccextractor channel5FullTS.ts --xmltv 1

  • Output: Both .srt AND .xml files generated successfully
  • XMLTV file contains:
    • Channel listings extracted from VCT with correct IDs
    • Program schedules parsed from EIT-0/1/2/3 (table IDs 0xCB-0xD0)
    • Extended program descriptions from ETT tables (0xCC)
    • UTC timestamps, titles, and subtitles properly captured
    • Unique ts-meta-id values matching EIT event IDs
    • Well-formatted XML with proper indentation

Sample XMLTV output (after ETT parsing):

Known Limitations

  • ATSC date/time conversion issues: ATSC date/time conversion occasionally produces incorrect years in some streams (pre-existing behavior).

  • Channel naming: XMLTV output uses numeric channel IDs (source_id) instead of human-readable names. VCT short_name and major/minor channel numbers are not currently mapped to XMLTV display-name elements.

  • Orphaned events: Some EIT events may appear under channel="0" when their service_id does not match any VCT-defined program. This occurs with malformed streams or when VCT data is incomplete.

These three accuracy issues mentioned above (incorrect dates, channel naming, orphaned programs) are data quality problems that existed in the codebase previously and are not directly caused by or related to the primary bug fix in this PR.

I believe these should be addressed in follow-up PRs for better separation of concerns. However, if maintainers prefer these issues to be fixed in this PR, I'm happy to include them.

@x15sr71 x15sr71 force-pushed the fix/atsc-eit-xmltv-generation branch from 52cce44 to b033bde Compare December 9, 2025 17:25
@x15sr71 x15sr71 marked this pull request as ready for review December 9, 2025 18:09
@ccextractor-bot
Copy link
Collaborator

CCExtractor CI platform finished running the test files on linux. Below is a summary of the test results, when compared to test for commit b293017...:
Report Name Tests Passed
Broken 13/13
CEA-708 14/14
DVB 7/7
DVD 3/3
DVR-MS 2/2
General 27/27
Hardsubx 1/1
Hauppage 3/3
MP4 3/3
NoCC 10/10
Options 86/86
Teletext 21/21
WTV 13/13
XDS 34/34

Congratulations: Merging this PR would fix the following tests:


All tests passing on the master branch were passed completely.

Check the result page for more info.

@ccextractor-bot
Copy link
Collaborator

CCExtractor CI platform finished running the test files on windows. Below is a summary of the test results, when compared to test for commit b293017...:
Report Name Tests Passed
Broken 13/13
CEA-708 14/14
DVB 7/7
DVD 3/3
DVR-MS 2/2
General 27/27
Hardsubx 1/1
Hauppage 3/3
MP4 3/3
NoCC 10/10
Options 86/86
Teletext 21/21
WTV 13/13
XDS 34/34

Congratulations: Merging this PR would fix the following tests:

  • ccextractor --autoprogram --out=srt --latin1 f1422b8bfe..., Last passed: Never
  • ccextractor --datapid 5603 --autoprogram --out=srt --latin1 --teletext 85c7fc1ad7..., Last passed: Never
  • ccextractor --autoprogram --out=srt --latin1 --quant 0 85271be4d2..., Last passed: Never
  • ccextractor --hardsubx 1a0302f7fd..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 c0d2fba8c0..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 006fdc391a..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 e92a1d4d2a..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 7e4ebf7fd7..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 9256a60e4b..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 27d7a43dd6..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 297a44921a..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 efbe129086..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 eae0077731..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 e2e2b501e0..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 c6407fb294..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 --datets dcada745de..., Last passed: Never
  • ccextractor --autoprogram --out=srt --latin1 --tpage 398 5d5838bde9..., Last passed: Never
  • ccextractor --autoprogram --out=srt --latin1 --teletext --tpage 398 3b276ad8bf..., Last passed: Never

All tests passing on the master branch were passed completely.

Check the result page for more info.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ccextractor appears to ignore -xmltv parameter

2 participants