Non-uniform tokenization of sentences having dialogue

A sentence which has quoted as well as non-quoted words in it is not parsed uniformly.

Given sentences such as-
"Where were you?" asked Mary angrily.

It will parse roughly half the sentences as one sentence -
1. "Where were you?" asked Mary angrily.

and the other half as -
1. "Where were you?"
2. asked Mary angrily.

This occurs when the following code is executed (in the most recent version)- 

```
             Properties props = new Properties();
             props.setProperty("annotators", "tokenize, ssplit, pos, lemma, ner, depparse");

             pipeline = new StanfordCoreNLP(props);

         Annotation document = new Annotation(doc);
             pipeline.annotate(document);

             List<CoreMap> sentences = document.get(SentencesAnnotation.class);

```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Non-uniform tokenization of sentences having dialogue #223

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Non-uniform tokenization of sentences having dialogue #223

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions