You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After finetuning the English model, I noticed that it performed significantly worse on natural language than the default eng model.
This makes sense since the README says there wont be a dict added to newly trained models. But how can I add one? Is it just a matter of adding the *-dawg files from one .traineddata file to another or does the wordlist have to be part of the training? Does that then mean my ground truth has to be in natural language? (I generated random strings, since that was quick to do.)
And if it's the former (easy option) then on which files do I need to call combine_tessdata? Also on the without_dict.traineddata or only on the files extracted from it?
I'm wondering because adding the -dawg files made no difference in the evaluation at all. Or does the lstmeval function not use dicts, since it's not calling tesseract but raw lstm?
Don't feel obliged to answer this is not an "issue" after all, but help would be very appreciated!
The text was updated successfully, but these errors were encountered:
After finetuning the English model, I noticed that it performed significantly worse on natural language than the default eng model.
This makes sense since the README says there wont be a dict added to newly trained models. But how can I add one? Is it just a matter of adding the
*-dawg
files from one.traineddata
file to another or does the wordlist have to be part of the training? Does that then mean my ground truth has to be in natural language? (I generated random strings, since that was quick to do.)And if it's the former (easy option) then on which files do I need to call
combine_tessdata
? Also on thewithout_dict.traineddata
or only on the files extracted from it?I'm wondering because adding the
-dawg
files made no difference in the evaluation at all. Or does thelstmeval
function not use dicts, since it's not calling tesseract but raw lstm?Don't feel obliged to answer this is not an "issue" after all, but help would be very appreciated!
The text was updated successfully, but these errors were encountered: