Upgrade to OpenNLP 2.5.x #14029

mawiesne · 2024-12-02T10:46:56Z

Description

Apache OpenNLP 2.5.0 has been released. This version contains new implementations of TokenNameFinder et al., that are Thread-Safe. Moreover, models for many new languages (32, as of Nov 2024) are now available. Those models are also available as Maven artifacts.

Apache OpenNLP 2.5.0 requires Java 17 and should be fully compatible with Java 21.

This task is update the OpenNLP dependency version to 2.5.x (x >= 0). Note: Release 2.5.1 is expected in December 2024.

mawiesne · 2024-12-02T10:47:26Z

FYI @cpoerschke - if you are interested in bringing this together and you encounter questions: the OpenNLP PMC members are happy to provide answers.

msfroh · 2024-12-27T02:29:12Z

I was looking into this (trying to upgrade to 2.5.1) and initially ran into some failing test cases.

It looks like they were all related to the switch of the default POSTagFormat from Penn to UD. I was able to get all the tests passing by changing this line:

lucene/lucene/analysis/opennlp/src/java/org/apache/lucene/analysis/opennlp/tools/NLPPOSTaggerOp.java

Line 33 in 9a88d8a

tagger = new POSTaggerME(model);

to

tagger = new POSTaggerME(model, POSTagFormat.PENN);

(I assume that we should support UD-style tags eventually too, but this at least keeps the existing functionality the same.)

mawiesne · 2024-12-27T19:31:51Z

@msfroh Thx for checking. The option (PENN format) you chose is the quick option for updating to 2.5.x (hint: x=2 released today).

The UD format will give the Lucene project a possibility to rely on a wider range of models for 32 languages, we have trained and published (see: OpenNLP models page) recently. Might be an option for 2025 and onwards: just switch to the UD model files and the corresponding format.

Open for any further feedback/questions.

mawiesne added the type:task label Dec 2, 2024

msfroh linked a pull request Jan 9, 2025 that will close this issue

Upgrade OpenNLP from 2.3.2 to 2.5.2 #14130

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upgrade to OpenNLP 2.5.x #14029

Upgrade to OpenNLP 2.5.x #14029

mawiesne commented Dec 2, 2024

mawiesne commented Dec 2, 2024 •

edited

Loading

msfroh commented Dec 27, 2024

mawiesne commented Dec 27, 2024

Upgrade to OpenNLP 2.5.x #14029

Upgrade to OpenNLP 2.5.x #14029

Comments

mawiesne commented Dec 2, 2024

Description

mawiesne commented Dec 2, 2024 • edited Loading

msfroh commented Dec 27, 2024

mawiesne commented Dec 27, 2024

mawiesne commented Dec 2, 2024 •

edited

Loading