@@ -198,8 +198,8 @@ set the environment variables.
198
198
To process English, German, Dutch, Spanish, Italian, French, Chinese or Russian documents,
199
199
the TreeTaggerWrapper can be used for pre-processing:
200
200
* Download the TreeTagger and its tagging scripts, installation scripts, as well as
201
- English, German, and Dutch (or any other ) parameter files into one directory from:
202
- http://www.ims .uni-stuttgart .de/projekte/corplex /TreeTagger/
201
+ English, German, and Dutch (and all required ) parameter files into one directory from:
202
+ http://www.cis .uni-muenchen .de/~schmid/tools /TreeTagger/
203
203
- mkdir treetagger
204
204
- cd treetagger
205
205
- wget http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/tree-tagger-linux-3.2.tar.gz
@@ -211,6 +211,8 @@ set the environment variables.
211
211
- wget http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/italian-par-linux-3.2-utf8.bin.gz
212
212
- wget http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/spanish-par-linux-3.2-utf8.bin.gz
213
213
- wget http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/french-par-linux-3.2-utf8.bin.gz
214
+ - wget http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/portuguese-par-linux-3.2-utf8.bin.gz
215
+ - wget http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/estonian-par-linux-3.2-utf8.bin.gz
214
216
Attention: If you do not use Linux, please download all TreeTagger files directly from
215
217
http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/
216
218
* (OPTIONAL) For Chinese documents, please get the Tokenizer and TreeTagger parameter file
@@ -279,9 +281,9 @@ set the environment variables.
279
281
You will need to enter the full path of the hunpos-1.0-linux directory in the
280
282
HunPosTaggerWrapper.
281
283
282
- To process any of the automatically create , you can use the AllLanguagesTokenizer
283
- which is part of the heideltime kit. It is a simple (whitespace-based) yet generic
284
- tool and creaetes sentence and token annotation.
284
+ To process any of the language with automatically created resources , you can use
285
+ the AllLanguagesTokenizer, which is part of the heideltime kit. It is a simple
286
+ (whitespace-based) yet generic tool and creaetes sentence and token annotation.
285
287
286
288
287
289
For sample UIMA workflows for any of the supported languages, please take a look
0 commit comments