-
Notifications
You must be signed in to change notification settings - Fork 314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
arpa2fst.py output an empty G_3_gram.fst.txt without any error messages #1877
Comments
Does your words.txt match your arpa file, i.e., are words in the arpa file present in the words.txt? |
Is there any tool that can check whether arpa file and words.txt are compatible? I took a quick look and found that the items that appear in the arpa file can basically be found in words.txt. |
it won't be closed. |
I ask chagpt to write a script to check. |
Current situation: The words that appear in the ARPA file also appear in the I have also tried the following:
Since I have no experience building language models, I would appreciate some advice, thank you [emoji].
|
Hi,
I use prepare_lm.sh to build HLG. However, the output file G_3_gram.fst.txt is empty.
I can convert vword.3gram.th1e-7.arpa to G.fst by using arpa2fst in kaldi.
but, a txt format G.fst is needed.
How to debug with arpa2fst in k2?
recipes in prepare_lm.sh:
...
mkdir -p data/lm
if [ ! -f data/lm/G_3_gram.fst.txt ]; then
# It is used in building HLG
python3 -m kaldilm
--read-symbol-table="data/lang_phone/words.txt"
--disambig-symbol='#0'
--max-order=3
$lm_word_dir/vword.3gram.th1e-7.arpa > data/lm/G_3_gram.fst.txt
...
vword.3gram.th1e-7.arpa looks like:
\data
ngram 1= 45342
ngram 2= 1110560
ngram 3= 342977
\1-grams:
-2.34093
-99
-1.42299-6.19571 GAdigAlik -0.129765
-4.53936 GAlbA -0.289333
-5.16337 GAlbigA -0.218938
-5.03704 GAlbilik -0.226182
-3.77986 GAlibA -0.498246
-5.97037 GAlibAN -0.200701
-4.34576 GAlibilik -0.278693
-4.56942 GAlibini -0.501565
-5.00196 GAlibiseri -0.446753
-4.57569 GAlibisi -0.203198
-5.27412 GAlibisigA -0.173731
-4.6907 GAlibisini -0.279294
-3.78802 GAlitA -0.327788
-5.52259 GAlitirAk -0.155028
-4.53936 GAllA -0.434895
-5.24367 GAlwA -0.170739
-5.21522 GAlwir -0.132563
-6.06158 GAlwirdA -0.129765
-4.85956 GAlyan -0.365357
-3.91926 GAm -0.475768
-6.34759 GAmHoluqiGa
-6.34759 GAmHorloq
-4.46482 GAmHorluq -0.438952
The text was updated successfully, but these errors were encountered: