Custom regex for words.txt

Asher uses the nlp system called compromise, it provides a neat way to lookup/grab words in a text, based on their parsed, interpreted representations- as opposed to just their characters.

For ease of use, it superficially resembles regex.

Results are an array of Terms objects, which allows you to manipulate individual matches, or operate on them in bulk. Transformations to matches apply to the original terms themselves, so you can efficiently inspect, transform, then return your parsed text.

Basic matching

term-term matches use normalised & non-normalised text as a direct lookup:

let matches = nlp('John eats glue!').match('john eats glue').out('text')
//"John eats glue"

POS matching

you can loosen a search by any matching part-of-speech, allowing you to find all the things john eats, for example:

let matches = nlp('John eats glue').match('john eats #Noun').out('text')
//"John eats glue"

let matches = nlp('John eats glue').match('john eats #Noun').out('text')
//"John eats glue"

the tags can also be optional ?, or greedy +

 nlp('he is good').match('#Adverb? good').out('text')
 //'good'
 nlp('he is really, really good').match('#Adverb+ good').out('text')
 //'really, really good'

Wildcard matching

The . character means 'any one term'.

let matches = nlp('John eats glue').match('john . glue').out('text')
//"John eats glue"

The * means 'all terms until'. It may be 0.

let matches = nlp('John always ravenously eats his glue').match('john * eats').out('text')
//"John always ravenously eats"

Optional matching

The ? character at the end of a word means it isn't necessary to be there.

let matches = nlp('John eats glue').match('john always? eats glue').out('text')
//"John eats glue"

let matches = nlp('John eats glue').match('john [Adverb]? eats glue').out('text')
//"John eats glue"

Greedy matches

the + character at the end of a tag (or .) implies the match will continue with repeated consecutive matches:

nlp('john, david, and joe went fishing').match('#Person+ and joe').out('text')
//'john, david and joe'

List of options

(word1|word2) parentheses allow listing possible matches for the word

let matches = nlp('John eats glue').match('john (eats|sniffs|wears) .').out('text')
//"John eats glue"

Actual RegEx /../

you can run a javascript regular-expression on every word in your document, if you wish, using the /myregex/ syntax.

nlp('it is raining and had rained').match('#Verb /rain[ing|ed]/').out('array')

note that this will not match multiple-word patterns, and will be slower than other lookups, like (#Verb raining|#Verb rained), for example.

Capture-groups

you can find a match and return only a subset of the match, using [] brackets around any group. Using this pattern you can effectively to 'look-arounds', to add conditions to a match statement.

nlp('i saw ralf eat the glue').match('#Person [#Verb the #Noun]').out('normal')
//"eat the glue"

Location flags

A leading ^ character means 'at the start of a sentence'.

let matches = nlp('John eats glue').match('^john eats ...').out('text')
//"John eats glue"

An ending $ character means 'must be at the end of the sentence'.

let matches = nlp('John eats glue').match('eats glue$').out('text')
//"eats glue"

Negative

you can specify a not-match with a ! character:

str = 'Homer Simpson and Homer Adkins'
nlp(str).match('homer !simpson').out()
//'Homer Adkins'

Range

you can specify a max, min number of wildcard terms, like this:

str = 'homer j j j j simpson'
nlp(str).match('homer #Acronym{2,6} simpson').out()

Prefix, suffix, infix

you can look for sub-word matches, using the _ character:

var r = nlp(`it's kind of a funny story`)
r.match('_nny') //funny
r.match('fu_')  //funny
r.match('_nn_') //funny
r.match('_d story').found //false

Provide feedback

Saved searches

Use saved searches to filter your results more quickly