[question] How to add a dictionary and what exactly is it? #416

blauertee · 2025-02-25T17:26:26Z

After finetuning the English model, I noticed that it performed significantly worse on natural language than the default eng model.

This makes sense since the README says there wont be a dict added to newly trained models. But how can I add one? Is it just a matter of adding the *-dawg files from one .traineddata file to another or does the wordlist have to be part of the training? Does that then mean my ground truth has to be in natural language? (I generated random strings, since that was quick to do.)

And if it's the former (easy option) then on which files do I need to call combine_tessdata? Also on the without_dict.traineddata or only on the files extracted from it?

I'm wondering because adding the -dawg files made no difference in the evaluation at all. Or does the lstmeval function not use dicts, since it's not calling tesseract but raw lstm?

Don't feel obliged to answer this is not an "issue" after all, but help would be very appreciated!

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[question] How to add a dictionary and what exactly is it? #416

[question] How to add a dictionary and what exactly is it? #416

blauertee commented Feb 25, 2025 •

edited

Loading

[question] How to add a dictionary and what exactly is it? #416

[question] How to add a dictionary and what exactly is it? #416

Comments

blauertee commented Feb 25, 2025 • edited Loading

blauertee commented Feb 25, 2025 •

edited

Loading