Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[question] How to add a dictionary and what exactly is it? #416

Open
blauertee opened this issue Feb 25, 2025 · 0 comments
Open

[question] How to add a dictionary and what exactly is it? #416

blauertee opened this issue Feb 25, 2025 · 0 comments

Comments

@blauertee
Copy link

blauertee commented Feb 25, 2025

After finetuning the English model, I noticed that it performed significantly worse on natural language than the default eng model.

This makes sense since the README says there wont be a dict added to newly trained models. But how can I add one? Is it just a matter of adding the *-dawg files from one .traineddata file to another or does the wordlist have to be part of the training? Does that then mean my ground truth has to be in natural language? (I generated random strings, since that was quick to do.)

And if it's the former (easy option) then on which files do I need to call combine_tessdata? Also on the without_dict.traineddata or only on the files extracted from it?

I'm wondering because adding the -dawg files made no difference in the evaluation at all. Or does the lstmeval function not use dicts, since it's not calling tesseract but raw lstm?

Don't feel obliged to answer this is not an "issue" after all, but help would be very appreciated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant