Description:
When a single-page document (e.g. a short policy or memo) is passed as input, the TOC detector incorrectly classifies the entire page as a table of contents. This causes the pipeline to set start_page_index = 1, which is out of bounds, resulting in no content being processed.
Root cause:
toc_detector_single_page prompt lacked sufficient guidance to distinguish between a true TOC (listing references to content elsewhere) and structured content that resembles a TOC (e.g. numbered policy sections).
Fix:
Strengthen the toc_detector_single_page prompt to explicitly clarify that pages containing actual document content with numbered sections (policies, regulations, rules) are not TOCs, and that a true TOC only lists references to content found elsewhere in the document