Skip to content

Commit 945b276

Browse files
committed
Merge remote-tracking branch 'refs/remotes/origin/dev' into dev
2 parents 974383a + 9889f4e commit 945b276

26 files changed

+558
-151
lines changed

README.md

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Stanford CoreNLP
22

3-
![Build Status](https://github.com/stanfordnlp/CoreNLP/actions/workflows/run-tests.yaml/badge.svg)
3+
[![Run Tests](https://github.com/stanfordnlp/CoreNLP/actions/workflows/run-tests.yaml/badge.svg)](https://github.com/stanfordnlp/CoreNLP/actions/workflows/run-tests.yaml)
44
[![Maven Central](https://img.shields.io/maven-central/v/edu.stanford.nlp/stanford-corenlp.svg)](https://mvnrepository.com/artifact/edu.stanford.nlp/stanford-corenlp)
55
[![Twitter](https://img.shields.io/twitter/follow/stanfordnlp.svg?style=social&label=Follow)](https://twitter.com/stanfordnlp/)
66

@@ -66,15 +66,15 @@ The jars can be directly downloaded from the links below or the Hugging Face Hub
6666

6767
| Language | Model Jar | Last Updated |
6868
| --- | --- | --- |
69-
| Arabic | [download](https://nlp.stanford.edu/software/stanford-arabic-corenlp-models-current.jar) [(HF Hub)](https://huggingface.co/stanfordnlp/corenlp-arabic/tree/main) | 4.4.0 |
70-
| Chinese | [download](https://nlp.stanford.edu/software/stanford-chinese-corenlp-models-current.jar) [(HF Hub)](https://huggingface.co/stanfordnlp/corenlp-chinese/tree/main)| 4.4.0 |
71-
| English (extra) | [download](https://nlp.stanford.edu/software/stanford-english-extra-corenlp-models-current.jar) [(HF Hub)](https://huggingface.co/stanfordnlp/corenlp-english-extra/tree/main) | 4.4.0 |
72-
| English (KBP) | [download](https://nlp.stanford.edu/software/stanford-english-kbp-corenlp-models-current.jar) [(HF Hub)](https://huggingface.co/stanfordnlp/corenlp-english-kbp/tree/main) | 4.4.0 |
73-
| French | [download](https://nlp.stanford.edu/software/stanford-french-corenlp-models-current.jar) [(HF Hub)](https://huggingface.co/stanfordnlp/corenlp-french/tree/main) | 4.4.0 |
74-
| German | [download](https://nlp.stanford.edu/software/stanford-german-corenlp-models-current.jar) [(HF Hub)](https://huggingface.co/stanfordnlp/corenlp-german/tree/main) | 4.4.0 |
75-
| Hungarian | [download](https://nlp.stanford.edu/software/stanford-hungarian-corenlp-models-current.jar) [(HF Hub)](https://huggingface.co/stanfordnlp/corenlp-hungarian/tree/main) | 4.4.0 |
76-
| Italian | [download](https://nlp.stanford.edu/software/stanford-italian-corenlp-models-current.jar) [(HF Hub)](https://huggingface.co/stanfordnlp/corenlp-italian/tree/main)| 4.4.0 |
77-
| Spanish | [download](https://nlp.stanford.edu/software/stanford-spanish-corenlp-models-current.jar) [(HF Hub)](https://huggingface.co/stanfordnlp/corenlp-spanish/tree/main)| 4.4.0 |
69+
| Arabic | [download](https://nlp.stanford.edu/software/stanford-arabic-corenlp-models-current.jar) [(HF Hub)](https://huggingface.co/stanfordnlp/corenlp-arabic/tree/main) | 4.5.0 |
70+
| Chinese | [download](https://nlp.stanford.edu/software/stanford-chinese-corenlp-models-current.jar) [(HF Hub)](https://huggingface.co/stanfordnlp/corenlp-chinese/tree/main)| 4.5.0 |
71+
| English (extra) | [download](https://nlp.stanford.edu/software/stanford-english-extra-corenlp-models-current.jar) [(HF Hub)](https://huggingface.co/stanfordnlp/corenlp-english-extra/tree/main) | 4.5.0 |
72+
| English (KBP) | [download](https://nlp.stanford.edu/software/stanford-english-kbp-corenlp-models-current.jar) [(HF Hub)](https://huggingface.co/stanfordnlp/corenlp-english-kbp/tree/main) | 4.5.0 |
73+
| French | [download](https://nlp.stanford.edu/software/stanford-french-corenlp-models-current.jar) [(HF Hub)](https://huggingface.co/stanfordnlp/corenlp-french/tree/main) | 4.5.0 |
74+
| German | [download](https://nlp.stanford.edu/software/stanford-german-corenlp-models-current.jar) [(HF Hub)](https://huggingface.co/stanfordnlp/corenlp-german/tree/main) | 4.5.0 |
75+
| Hungarian | [download](https://nlp.stanford.edu/software/stanford-hungarian-corenlp-models-current.jar) [(HF Hub)](https://huggingface.co/stanfordnlp/corenlp-hungarian/tree/main) | 4.5.0 |
76+
| Italian | [download](https://nlp.stanford.edu/software/stanford-italian-corenlp-models-current.jar) [(HF Hub)](https://huggingface.co/stanfordnlp/corenlp-italian/tree/main)| 4.5.0 |
77+
| Spanish | [download](https://nlp.stanford.edu/software/stanford-spanish-corenlp-models-current.jar) [(HF Hub)](https://huggingface.co/stanfordnlp/corenlp-spanish/tree/main)| 4.5.0 |
7878

7979
Thank you to [Hugging Face](https://huggingface.co/) for helping with our hosting!
8080

build.gradle

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ sourceCompatibility = 1.8
1111
targetCompatibility = 1.8
1212
compileJava.options.encoding = 'UTF-8'
1313

14-
version = '4.4.0'
14+
version = '4.5.0'
1515

1616
// Gradle application plugin
1717
mainClassName = "edu.stanford.nlp.pipeline.StanfordCoreNLP"

doc/corenlp/README.txt

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,9 @@ LICENSE
4141
CHANGES
4242
---------------------------------
4343

44+
2022-07-21 4.5.0 Tokenizer and lemmatizer upgrades, along with
45+
a new tsurgeon operation and some bugfixes
46+
4447
2022-01-20 4.4.0 Fix issue with Italian depparse, tsurgeon CLI,
4548
fix security issues, bug fixes
4649

doc/corenlp/pom-full.xml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
<modelVersion>4.0.0</modelVersion>
33
<groupId>edu.stanford.nlp</groupId>
44
<artifactId>stanford-corenlp</artifactId>
5-
<version>4.4.0</version>
5+
<version>4.5.0</version>
66
<packaging>jar</packaging>
77
<name>Stanford CoreNLP</name>
88
<description>Stanford CoreNLP provides a set of natural language analysis tools which can take raw English language text input and give the base forms of words, their parts of speech, whether they are names of companies, people, etc., normalize dates, times, and numeric quantities, mark up the structure of sentences in terms of phrases and word dependencies, and indicate which noun phrases refer to the same entities. It provides the foundational building blocks for higher level text understanding applications.</description>
@@ -14,8 +14,8 @@
1414
</license>
1515
</licenses>
1616
<scm>
17-
<url>https://nlp.stanford.edu/software/stanford-corenlp-4.4.0.zip</url>
18-
<connection>https://nlp.stanford.edu/software/stanford-corenlp-4.4.0.zip</connection>
17+
<url>https://nlp.stanford.edu/software/stanford-corenlp-4.5.0.zip</url>
18+
<connection>https://nlp.stanford.edu/software/stanford-corenlp-4.5.0.zip</connection>
1919
</scm>
2020
<developers>
2121
<developer>
@@ -202,7 +202,7 @@
202202
<configuration>
203203
<artifacts>
204204
<artifact>
205-
<file>${project.basedir}/stanford-corenlp-4.4.0-models.jar</file>
205+
<file>${project.basedir}/stanford-corenlp-4.5.0-models.jar</file>
206206
<type>jar</type>
207207
<classifier>models</classifier>
208208
</artifact>

doc/corenlp/pom-light.xml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
<modelVersion>4.0.0</modelVersion>
33
<groupId>edu.stanford.nlp</groupId>
44
<artifactId>stanford-corenlp</artifactId>
5-
<version>4.4.0</version>
5+
<version>4.5.0</version>
66
<packaging>jar</packaging>
77
<name>Stanford CoreNLP</name>
88
<description>Stanford CoreNLP provides a set of natural language analysis tools which can take raw English language text input and give the base forms of words, their parts of speech, whether they are names of companies, people, etc., normalize dates, times, and numeric quantities, mark up the structure of sentences in terms of phrases and word dependencies, and indicate which noun phrases refer to the same entities. It provides the foundational building blocks for higher level text understanding applications.</description>
@@ -14,8 +14,8 @@
1414
</license>
1515
</licenses>
1616
<scm>
17-
<url>https://nlp.stanford.edu/software/stanford-corenlp-4.4.0.zip</url>
18-
<connection>https://nlp.stanford.edu/software/stanford-corenlp-4.4.0.zip</connection>
17+
<url>https://nlp.stanford.edu/software/stanford-corenlp-4.5.0.zip</url>
18+
<connection>https://nlp.stanford.edu/software/stanford-corenlp-4.5.0.zip</connection>
1919
</scm>
2020
<developers>
2121
<developer>
@@ -56,7 +56,7 @@
5656
<configuration>
5757
<artifacts>
5858
<artifact>
59-
<file>${project.basedir}/stanford-corenlp-4.4.0-models.jar</file>
59+
<file>${project.basedir}/stanford-corenlp-4.5.0-models.jar</file>
6060
<type>jar</type>
6161
<classifier>models</classifier>
6262
</artifact>

examples/sample-maven-project/pom.xml

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -17,66 +17,66 @@
1717
<dependency>
1818
<groupId>edu.stanford.nlp</groupId>
1919
<artifactId>stanford-corenlp</artifactId>
20-
<version>4.4.0</version>
20+
<version>4.5.0</version>
2121
</dependency>
2222
<dependency>
2323
<groupId>edu.stanford.nlp</groupId>
2424
<artifactId>stanford-corenlp</artifactId>
25-
<version>4.4.0</version>
25+
<version>4.5.0</version>
2626
<classifier>javadoc</classifier>
2727
</dependency>
2828
<dependency>
2929
<groupId>edu.stanford.nlp</groupId>
3030
<artifactId>stanford-corenlp</artifactId>
31-
<version>4.4.0</version>
31+
<version>4.5.0</version>
3232
<classifier>sources</classifier>
3333
</dependency>
3434
<dependency>
3535
<groupId>edu.stanford.nlp</groupId>
3636
<artifactId>stanford-corenlp</artifactId>
37-
<version>4.4.0</version>
37+
<version>4.5.0</version>
3838
<classifier>models</classifier>
3939
</dependency>
4040
<dependency>
4141
<groupId>edu.stanford.nlp</groupId>
4242
<artifactId>stanford-corenlp</artifactId>
43-
<version>4.4.0</version>
43+
<version>4.5.0</version>
4444
<classifier>models-arabic</classifier>
4545
</dependency>
4646
<dependency>
4747
<groupId>edu.stanford.nlp</groupId>
4848
<artifactId>stanford-corenlp</artifactId>
49-
<version>4.4.0</version>
49+
<version>4.5.0</version>
5050
<classifier>models-chinese</classifier>
5151
</dependency>
5252
<dependency>
5353
<groupId>edu.stanford.nlp</groupId>
5454
<artifactId>stanford-corenlp</artifactId>
55-
<version>4.4.0</version>
55+
<version>4.5.0</version>
5656
<classifier>models-english</classifier>
5757
</dependency>
5858
<dependency>
5959
<groupId>edu.stanford.nlp</groupId>
6060
<artifactId>stanford-corenlp</artifactId>
61-
<version>4.4.0</version>
61+
<version>4.5.0</version>
6262
<classifier>models-english-kbp</classifier>
6363
</dependency>
6464
<dependency>
6565
<groupId>edu.stanford.nlp</groupId>
6666
<artifactId>stanford-corenlp</artifactId>
67-
<version>4.4.0</version>
67+
<version>4.5.0</version>
6868
<classifier>models-french</classifier>
6969
</dependency>
7070
<dependency>
7171
<groupId>edu.stanford.nlp</groupId>
7272
<artifactId>stanford-corenlp</artifactId>
73-
<version>4.4.0</version>
73+
<version>4.5.0</version>
7474
<classifier>models-german</classifier>
7575
</dependency>
7676
<dependency>
7777
<groupId>edu.stanford.nlp</groupId>
7878
<artifactId>stanford-corenlp</artifactId>
79-
<version>4.4.0</version>
79+
<version>4.5.0</version>
8080
<classifier>models-spanish</classifier>
8181
</dependency>
8282
</dependencies>

itest/src/edu/stanford/nlp/pipeline/RequirementsCorrectSlowITest.java

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -119,27 +119,27 @@ private void testAnnotatorSequence(List<String> annotators) {
119119

120120
@Test
121121
public void testDefaultPipeline() {
122-
testAnnotatorSequence(Arrays.asList("tokenize", "ssplit", "pos", "lemma", "ner", "gender", "parse", "coref"));
122+
testAnnotatorSequence(Arrays.asList("tokenize", "pos", "lemma", "ner", "gender", "parse", "coref"));
123123
}
124124

125125
@Test
126126
public void testDepparsePipeline() {
127-
testAnnotatorSequence(Arrays.asList("tokenize", "ssplit", "pos", "depparse"));
127+
testAnnotatorSequence(Arrays.asList("tokenize", "pos", "depparse"));
128128
}
129129

130130
@Test
131131
public void testQuotePipeline() {
132-
testAnnotatorSequence(Arrays.asList("tokenize","ssplit","pos","lemma","ner","depparse","coref","quote"));
132+
testAnnotatorSequence(Arrays.asList("tokenize","pos","lemma","ner","depparse","coref","quote"));
133133
}
134134

135-
@Test
136-
public void testTrueCasePipeline() {
137-
testAnnotatorSequence(Arrays.asList("tokenize","ssplit","pos","lemma","truecase"));
135+
@Test
136+
public void testTrueCasePipeline() {
137+
testAnnotatorSequence(Arrays.asList("tokenize","pos","lemma","truecase"));
138138
}
139139

140140
@Test
141141
public void testOpenIEPipeline() {
142-
testAnnotatorSequence(Arrays.asList("tokenize","ssplit","pos","lemma","depparse","natlog","openie"));
142+
testAnnotatorSequence(Arrays.asList("tokenize","pos","lemma","depparse","natlog","openie"));
143143
}
144144

145145
@Test

itest/src/edu/stanford/nlp/pipeline/StanfordCoreNLPServerITest.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -143,7 +143,7 @@ public void testSemgrexJson() throws IOException {
143143

144144
@Test
145145
public void testSemgrexAnnotation() throws IOException {
146-
String expected = "result { result { match { matchIndex: 3 node { name: \"verb\" matchIndex: 3 } node { name: \"obj\" matchIndex: 5 } } }}".replaceAll(" ", "");
146+
String expected = "result { result { match { matchIndex: 3 node { name: \"verb\" matchIndex: 3 } node { name: \"obj\" matchIndex: 5 } graphIndex:0 semgrexIndex:0 } }}".replaceAll(" ", "");
147147
String query = "The dog ate a fish";
148148
byte[] message = query.getBytes("utf-8");
149149
Properties props = new Properties();

src/edu/stanford/nlp/parser/lexparser/BaseLexicon.java

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -63,6 +63,11 @@ public class BaseLexicon implements Lexicon {
6363
protected static final IntTaggedWord NULL_ITW = new IntTaggedWord(nullWord, nullTag);
6464

6565
protected final TrainOptions trainOptions;
66+
// TODO: remove this link
67+
// the only reason it is needed is because testOptions has an item,
68+
// unseenSmooth, which belongs in trainOptions
69+
// the problem is moving that and/or removing this link will invalidate
70+
// all existing serialized models
6671
protected final TestOptions testOptions;
6772

6873
protected final Options op;

src/edu/stanford/nlp/pipeline/CoreNLP.proto

Lines changed: 14 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -627,6 +627,10 @@ message SemgrexRequest {
627627
// If you pass in M semgrex expressions and N dependency graphs,
628628
// this returns MxN nested results. Each SemgrexResult can match
629629
// multiple times in one graph
630+
//
631+
// You may want to send multiple semgrexes per query because
632+
// translating large numbers of dependency graphs to protobufs
633+
// will be expensive, so doing several queries at once will save time
630634
message SemgrexResponse {
631635
message NamedNode {
632636
required string name = 1;
@@ -639,9 +643,16 @@ message SemgrexResponse {
639643
}
640644

641645
message Match {
642-
required int32 matchIndex = 1;
643-
repeated NamedNode node = 2;
644-
repeated NamedRelation reln = 3;
646+
required int32 matchIndex = 1;
647+
repeated NamedNode node = 2;
648+
repeated NamedRelation reln = 3;
649+
// when processing multiple dependency graphs at once,
650+
// which dependency graph this applies to
651+
// indexed from 0
652+
optional int32 graphIndex = 4;
653+
// index of the semgrex expression this match applies to
654+
// indexed from 0
655+
optional int32 semgrexIndex = 5;
645656
}
646657

647658
message SemgrexResult {

0 commit comments

Comments
 (0)