Skip to content

Conversation

francolq
Copy link
Contributor

try:
m = next(i)
start, end = m.span()
# FIXME: do not count from the beggining
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should we do with this FIXME note? Is it already fixed and you just forgot to remove the comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, my bad. This is not a FIXME, it is at most a TODO, because a small optimization can be done here. I think the comment can be safely removed.

# preprocess the regular expression
regexp = re.sub(r'\s', '', regexp)
# replace < and > only if not double (<< or >>):
# FIXME: avoid matching \< and \>.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question re FIXME here.
Can't be solved before merging into develop?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost same answer here. My bad to call this a FIXME. It would be an enhancement to allow escaping '<' and '>'. The comment can be removed.


def __init__(self, tokens):
# replace < and > inside tokens with \< and \>
_raw = '><'.join(w.replace('<', '\<').replace('>', '\>') for w in tokens)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm no completely sure, but would it be a problem if there is a token with \< (or \>) inside it?

def run_ner(self, doc):
entities = []
tokens = doc.tokens
searcher = TokenSearcher(tokens)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TokenSearcher is implemented as a class but it is stateless in a practical sense and it is used more like a function than a class.

import re
import codecs

from nltk.text import TokenSearcher as NLTKTokenSearcher
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This import looks unused since (apparently) no method from NLTKTokenSearcher is used.

token_start = self._raw[:start].count('><')
token_end = self._raw[:end].count('><')
yield MatchObject(m, token_start, token_end)
except:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This try...except is dangerous because it silently hides any error that can happen inside the loop.

Why not to use for m in i: instead?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants