Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Empty Hypotheses with n_best-oracle Decoding in ctc_decode.py #1870

Open
moadel2002 opened this issue Jan 28, 2025 · 2 comments
Open

Empty Hypotheses with n_best-oracle Decoding in ctc_decode.py #1870

moadel2002 opened this issue Jan 28, 2025 · 2 comments

Comments

@moadel2002
Copy link

When I decode using the n_best-oracle method in ctc_decode.py with my dataset, I consistently get empty hypotheses, resulting in a 100% WER.
I trained zipformer model on my own dataset and I used the checkpoint already with streaming decode and it works fine, in addtition to creating HLG.pt with steps prepare_lm.sh (3gram model) .
Here is the script I used for decoding
I added two arguments to (-test-set-cut-path, --test-set-name) instead of using librispeech data

./zipformer/ctc_decode.py \
    --test-set-cut-path "data/manifests/MGB3_dev/cuts.jsonl.gz" \
    --test-set-name "MGB3_dev" \
    --epoch 39 \
    --exp-dir "zipformer/exp" \
    --max-duration 100 \
    --context-size 2 \
    --num-paths 100 \
    --on-the-fly-feats True \
    --nbest-scale 0.5 \
    --decoding-method "nbest-oracle" \
    --hlg-scale 0.6 \
    --lang-dir data/lang_bpe_5000 \
    --use-ctc True

Here is a snapshot of the output in errs.txt:
%WER = 100.00 Errors: 0 insertions, 30997 deletions, 0 substitutions, over 30997 reference words (0 correct) Search below for sections starting with PER-UTT DETAILS:, SUBSTITUTIONS:, DELETIONS:, INSERTIONS:, PER-WORD STATS: PER-UTT DETAILS: corr or (ref->hyp) comedy_75_first_12min_566.284_574.964-58: (أما بييجي يقول اليوم بيقول اليوم وكان هذا هي نشرة أخبار اليوم->*) comedy_75_first_12min_133.783_142.442-4: (أ قرود زي بعض قروض زي بعض آه لكن يبقى أرنب أخو قرد ما هو كارتون بقى وكده آه أطفال أطفال مش فاهمين أطفال بقى آه آه->*) comedy_75_first_12min_218.077_226.666-12: (عامل نفسه كوميدي قال يعني يبص إزاي فتقوم التانية أول ما تشوفه->*)

@moadel2002
Copy link
Author

@danpovey

@danpovey
Copy link
Collaborator

danpovey commented Feb 8, 2025

I think you should add various printouts in the function nbest_oracle in icefall/decode.py.
try printing out the ref_texts and the lattice, and the word_table. See if the texts in ref_texts can be successfully
split into words with .split(), and whether it's successfully looking up the words in the symbol table.
Edit: it looks like it's the hyp that's empty, so you might focus on the lattice. Is the lattice nonempty?
Try looking up some of the word-ids in aux_labels, and see if they point to correct-looking words.
You can also see whether doing this on CPU vs GPU makes a difference: calling nbest_oracle
with lattice.to('cpu') in place of lattice may help. (not sure what it's called in the calling code, possibly
nbest).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants