You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In English (at least in the C version of the semantic tagger) we use auxiliary verb rules for POS tags VB* (be), VD* (do), VH* (have), to determine the main and auxiliary verbs and therefore alter the semantic tag.
An auxiliary verb would normally be given the USAS semantic tag Z5 grammatical bin, whereas the main verb would be given a non Z5 tag. For example in the sentence (format is token_USAS semantic tag) below the auxiliary verb is have and the main verb is finished:
I_Z8mf have_Z5 finished_T2- my_Z8 lunch_F1 ._PUNC
We have approximately 35 rules in place for amending the semantic tags on be, do, and have after the initial set of potential semantic tags are applied. An example rule for have is as follows:
VH*[Z5] (RR*n) (RT*n) (XX) (RR*n) (RT*n) V*N
If the sequence of POS tags matches a given context, VH* (POS tag for have) followed by V*N (POS tag for the word finished) with optional intervening adverbs (R* POS tags) or negation (XX POS tag), then the rule instructs the tagger to change the semantic tag on the auxiliary verb have to be Z5.
For semantic taggers in other languages (the Java versions), we do not have auxiliary/main verb rules in place.
How this rule maps to spaCy pipeline through UPOS tagset
In the UPOS tagset and therefore spaCy POS models we can use the AUX POS tag from the UPOS tagset, instead of VB* (be), VD* (do), VH* (have). Below is the code and output of running the small English spaCy model on the sentence I have finished my lunch.:
importspacynlp=spacy.load('en_core_web_sm')
doc=nlp('I have finished my lunch.')
print('Token\tPOS')
fortokenindoc:
print(f'{token.text}\t{token.pos_}')
I've updated the comment to explain things further. It'd be good to find some evaluation of how accurate the auxiliary verb detection is in spaCy. We described our original approach in this UCREL technical paper: https://ucrel.lancs.ac.uk/papers/techpaper/vol3.pdf
To incorporate auxiliary verb rules into the USAS Rule Based Tagger.
Definition of auxiliary verb rules
All POS tags used here are from the CLAWS C7 tagset.
In English (at least in the C version of the semantic tagger) we use auxiliary verb rules for POS tags
VB*
(be),VD*
(do),VH*
(have), to determine the main and auxiliary verbs and therefore alter the semantic tag.An auxiliary verb would normally be given the USAS semantic tag
Z5
grammatical bin, whereas the main verb would be given a nonZ5
tag. For example in the sentence (format istoken_USAS semantic tag
) below the auxiliary verb ishave
and the main verb isfinished
:We have approximately 35 rules in place for amending the semantic tags on
be
,do
, andhave
after the initial set of potential semantic tags are applied. An example rule forhave
is as follows:If the sequence of POS tags matches a given context,
VH*
(POS tag forhave
) followed byV*N
(POS tag for the wordfinished
) with optional intervening adverbs (R*
POS tags) or negation (XX
POS tag), then the rule instructs the tagger to change the semantic tag on the auxiliary verbhave
to beZ5
.For semantic taggers in other languages (the Java versions), we do not have auxiliary/main verb rules in place.
How this rule maps to spaCy pipeline through UPOS tagset
In the UPOS tagset and therefore spaCy POS models we can use the
AUX
POS tag from the UPOS tagset, instead ofVB*
(be),VD*
(do),VH*
(have). Below is the code and output of running the small English spaCy model on the sentenceI have finished my lunch.
:Output:
The text was updated successfully, but these errors were encountered: