-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade to OpenNLP 2.5.x #14029
Comments
FYI @cpoerschke - if you are interested in bringing this together and you encounter questions: the OpenNLP PMC members are happy to provide answers. |
I was looking into this (trying to upgrade to 2.5.1) and initially ran into some failing test cases. It looks like they were all related to the switch of the default POSTagFormat from Penn to UD. I was able to get all the tests passing by changing this line: lucene/lucene/analysis/opennlp/src/java/org/apache/lucene/analysis/opennlp/tools/NLPPOSTaggerOp.java Line 33 in 9a88d8a
to
(I assume that we should support UD-style tags eventually too, but this at least keeps the existing functionality the same.) |
@msfroh Thx for checking. The option (PENN format) you chose is the quick option for updating to 2.5.x (hint: x=2 released today). The UD format will give the Lucene project a possibility to rely on a wider range of models for 32 languages, we have trained and published (see: OpenNLP models page) recently. Might be an option for 2025 and onwards: just switch to the UD model files and the corresponding format. Open for any further feedback/questions. |
Description
Apache OpenNLP 2.5.0 has been released. This version contains new implementations of TokenNameFinder et al., that are Thread-Safe. Moreover, models for many new languages (32, as of Nov 2024) are now available. Those models are also available as Maven artifacts.
Apache OpenNLP 2.5.0 requires Java 17 and should be fully compatible with Java 21.
This task is update the OpenNLP dependency version to 2.5.x (x >= 0). Note: Release 2.5.1 is expected in December 2024.
The text was updated successfully, but these errors were encountered: