fix(asr): log active audio filename in AsrPipeline._build_document#3535
fix(asr): log active audio filename in AsrPipeline._build_document#3535charlesaurav13 wants to merge 3 commits into
Conversation
When converting a batch (e.g. a directory of audio files), there was no per-file log entry indicating which document was being processed. Only a single 'Going to convert document batch...' message appeared, making it impossible to track progress or identify which file was active. Add a per-document info log at the start of _execute_pipeline, which fires for both sequential and concurrent conversion paths. Fixes docling-project#3467
|
❌ DCO Check Failed Hi @charlesaurav13, your pull request has failed the Developer Certificate of Origin (DCO) check. This repository supports remediation commits, so you can fix this without rewriting history — but you must follow the required message format. 🛠 Quick Fix: Add a remediation commitRun this command: git commit --allow-empty -s -m "DCO Remediation Commit for Saurav Pandey <sauravp1236@gmail.com>
I, Saurav Pandey <sauravp1236@gmail.com>, hereby add my Signed-off-by to this commit: 58650cbbb5e986562452cd4d615c8707a0a76c21"
git push🔧 Advanced: Sign off each commit directlyFor the latest commit: git commit --amend --signoff
git push --force-with-leaseFor multiple commits: git rebase --signoff origin/main
git push --force-with-leaseMore info: DCO check report |
Merge ProtectionsYour pull request matches the following merge protections and will not be merged until they are valid. 🟢 Enforce conventional commitWonderful, this rule succeeded.Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/
|
I, Saurav Pandey <sauravp1236@gmail.com>, hereby add my Signed-off-by to this commit: 58650cbdb05c0fcb5c54c8a4d6e83f3faee93e4d Signed-off-by: Saurav Pandey <sauravp1236@gmail.com>
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
|
@charlesaurav13 thanks for the proposal, but I think the fix is not complete since it will then duplicate the logging for PDF files for example, which already happens in the particular pipeline, such as: Either you move this log into the simple pipeline / audio backend or we must remove logs on the other pipelines. Please also apply the linting ( |
cau-git
left a comment
There was a problem hiding this comment.
changes required as outlined above.
|
Good catch — adding the log at The right scope is the ASR pipeline specifically (where no equivalent log currently exists). I'll update this PR to move the log into Will also clear the DCO and run |
The log was placed at _execute_pipeline level in DocumentConverter, which fires for all document types and duplicates the existing 'Processing document X' message already emitted by BasePipeline.execute. Scope the log to AsrPipeline._build_document instead, where audio transcription specifically starts. Also improve the message from the internal 'start _build_document in AsrPipeline: <full_path>' to the user-friendly 'Transcribing audio document <filename>.' using only the filename (not the full path) for consistency with other pipeline logs. Fixes docling-project#3467 Signed-off-by: Saurav Pandey <sauravp1236@gmail.com>
|
Updated. The change is now scoped to
This means the log fires only for audio files, with no duplication. DCO signed off with |
Summary
When converting a batch (e.g. a directory of audio files with
docling /path/to/dir), only a singleGoing to convert document batch...log entry appeared for the entire batch, with no indication of which file was being processed.This made it impossible to track progress or identify which file was currently active — especially confusing for long audio transcriptions.
Fix: add a per-document
infolog at the start of_execute_pipeline. This fires for both sequential and concurrent conversion paths and is consistent with the existingFinished converting document X in Y sec.log that follows it.Reproduction
Previously only showed
Going to convert document batch...with no per-file progress.Related issue
Closes #3467