add: metacat can predict on spans in arbitrary spangroups #391

jkgenser · 2024-01-22T16:56:26Z

Hello - I opened this PR request to start a discussion about adding a new feature that allows a trained MetaCAT model to run on arbitrary spangroup. This will will allow configuring a MetaCAT model such that it can be pointed at an arbitrary spacy span group and set meta_anns on it.

Currently, it is allowed to run on the two set of list[spacy.token.Span]: doc.ents or doc._.ents.

However, there are use cases where we may store other spans as part of a pipeline under doc.spans["my_spangroup_name"] that are distinct from doc.ents in which we want to call predict on the meta cat model.

I don't think it's yet necessary to be able to train on arbitrary spangroups since that would get into changing the input format or extract format from medcat trainer.

One workaround is to either zero out the ents object and set the spangroup to those ents and call predict however this starts to feel hacky.

Another workaround is to create a duplicate spacy.token.Doc object and using doc.ents for the separate spangroup. However this is inefficient as it requires running parts of the spacy pipeline again and means we can't use this spangroup in other pipeline components.

mart-r

Thanks for putting in the effort! It's much appreciated!

Some changes for GHA to run.
Some changes for clarity.

medcat/config_meta_cat.py

medcat/meta_cat.py

tests/test_meta_cat.py

jkgenser · 2024-01-23T13:19:59Z

Thanks for putting in the effort! It's much appreciated!

Some changes for GHA to run. Some changes for clarity.

Thank you for your comments! I will address them.

jkgenser · 2024-01-23T13:59:34Z

@mart-r I have iterated on your PR comments. I also added a more informative warning in case the span group is not set no the doc and a test for it.

jkgenser · 2024-01-26T13:34:31Z

Thanks for putting in the effort! It's much appreciated!

Some changes for GHA to run. Some changes for clarity.

I've done my best to address all of your changes. Let me know if you have further comments!

mart-r

Looks good overall.

But please address the GitHub Actions issues (linting so far):

medcat/meta_cat.py:359:1: W293 blank line contains whitespace

jkgenser · 2024-01-29T20:06:54Z

Looks good overall.

But please address the GitHub Actions issues (linting so far):

medcat/meta_cat.py:359:1: W293 blank line contains whitespace

@mart-r Got it, I fixed the linting and flake8 passes locally.

add: ability to predict on other spangroups

3b2c328

jkgenser changed the title ~~add: ability to predict on other spangroups~~ add: metacat can predict on spans in arbitrary spangroups Jan 22, 2024

mart-r requested changes Jan 23, 2024

View reviewed changes

medcat/config_meta_cat.py Show resolved Hide resolved

medcat/meta_cat.py Outdated Show resolved Hide resolved

tests/test_meta_cat.py Outdated Show resolved Hide resolved

tests/test_meta_cat.py Outdated Show resolved Hide resolved

add: pr comments and better error

ed7d653

jkgenser requested a review from mart-r January 23, 2024 13:59

fix: typo

5706016

mart-r approved these changes Jan 29, 2024

View reviewed changes

fix: linting

fb51620

jkgenser requested a review from mart-r January 29, 2024 20:06

mart-r approved these changes Jan 30, 2024

View reviewed changes

mart-r merged commit 85cbe77 into CogStack:master Jan 30, 2024
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add: metacat can predict on spans in arbitrary spangroups #391

add: metacat can predict on spans in arbitrary spangroups #391

jkgenser commented Jan 22, 2024

mart-r left a comment

jkgenser commented Jan 23, 2024

jkgenser commented Jan 23, 2024

jkgenser commented Jan 26, 2024

mart-r left a comment

jkgenser commented Jan 29, 2024

add: metacat can predict on spans in arbitrary spangroups #391

add: metacat can predict on spans in arbitrary spangroups #391

Conversation

jkgenser commented Jan 22, 2024

mart-r left a comment

Choose a reason for hiding this comment

jkgenser commented Jan 23, 2024

jkgenser commented Jan 23, 2024

jkgenser commented Jan 26, 2024

mart-r left a comment

Choose a reason for hiding this comment

jkgenser commented Jan 29, 2024