Handling unigram and bigram features at the same time in word2features

Hello,
I am trying to perform an NER experiment on a custom dataset containing a lot of food items.
I have labels for certain unigrams and bigrams for my training data.

My label corpus contains "green chilli" = "vegetable". I don't have "chilli" as a label 
I am using this label list in order to annotate sentences for NER.

For example:

A sentence might contain a bigram such as "green chilli" with it's associated label = "vegetable"

Currently while generating the features, I am marking both "green" and "chilli" as "vegetable". 
My annotation pipeline is as follows:

- Split sentence into **unigrams**
- Check if **unigram** exists in label list -> If label exists mark unigram with **label** 
- Get bigram by considering **token + sentence[idx+1]**  or **token + sentence[idx-1]**
- Check if bigram exists in label corpus -->> mark both token and sentence[idx+1] or sentence[idx-1] with that label

As a result of point number 4, both **green** and **chilli** get marked as **vegetable**

So when I train my model and run inference on a test sentence containing **"green chilli"**, I would get **"vegetable"**, **"vegetable"** twice.

What would be the best way to annotate this using word2features?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Handling unigram and bigram features at the same time in word2features #137

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Handling unigram and bigram features at the same time in word2features #137

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions