lexsub : context-sensitive word substitutions using Word2Vec

Disambiguating between the possible senses of a word in the context of a sentence is a fundamental problem in NLP. However, this assumes a universal set of "meanings" to disambiguate between. A more natural but also more practical task is finding a good substitution for a word in context. For example, in the sentence "She went to the bar last night", we know bar means pub, but the word bar has other meanings: a chocolate bar, or a ban/restriction on something.

This repository uses a Word2Vec embedding based on the Google News corpus, made available here and through the gensim library to rank candidate word substitutions by their suitability to the context of the sentence.

Setup

Download the Google News word vectors from here and make sure you have the gensim package installed.
Make sure you've installed nltk (natural language toolkit) and have downloaded the lin thesaurus and wordnet corpora by executing the following in the python console: import nltk, nltk.download('lin_thesaurus'), nltk.download('wordnet')

Example Usage

from lexsub import LexSub
from gensim.models import KeyedVectors

word2vec_path = "/path/to/GoogleNews-vectors-negative300.bin"
vectors = KeyedVectors.load_word2vec_format(word2vec_path, binary=True)
ls = LexSub(vectors, candidate_generator='lin')

sentence = "She had a drink at the bar"
target = "bar.n"
result = ls.lex_sub(target, sentence)
print(result)
# ['bars', 'pub', 'tavern', 'nightclub', 'restaurant']

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

lexsub : context-sensitive word substitutions using Word2Vec

Setup

Example Usage

Files

README.md

Latest commit

History

README.md

File metadata and controls

lexsub : context-sensitive word substitutions using Word2Vec

Setup

Example Usage