You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
But even this is troublesome. For example, ('?', '', '') in the last_order will add its suffix ('', '') to the new_order. But I think two pad symbol is not valid in a bigram model.
Therefore, I think a better way to do kgram count is to do each order independently and directly from corpus data.
And in the class KneserNeyLM definition, using highest_order gram ngrams as arg and in the example.py usinggut_ngrams need to be revised as well.
The text was updated successfully, but these errors were encountered:
Maybe it is trivial and I am wrong.
From the paper I think the count of a k-gram "word" is its occurrence in the corpus data not in its higher-order gram types. If this is the case,
kneser-ney/kneser_ney.py
Line 58 in 2740fba
But even this is troublesome. For example, ('?', '', '') in the last_order will add its suffix ('', '') to the new_order. But I think two pad symbol is not valid in a bigram model.
Therefore, I think a better way to do kgram count is to do each order independently and directly from corpus data.
And in the class KneserNeyLM definition, using highest_order gram ngrams as arg and in the example.py usinggut_ngrams need to be revised as well.
The text was updated successfully, but these errors were encountered: