Skip to content

der-bruemmer/conlltonif

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ConLL09 Dependency Annotated Corpora to NIF

Transforms a file or string in ConLL09 dependency tree format (http://ufal.mff.cuni.cz/conll2009-st/task-description.html) to the NIF format.

Output will be one NIF file with one nif:Context element for the whole input text.

Parameters

In addition to the usual NIF API parameters, there is the parameter tagset that can be used to set the tagset used by the corpus. Choose one of the OLiA tagsets implemented here.

You can also set an output file via outfile

Example

mvn exec:java -e  -Dexec.mainClass="org.nlp2rdf.implementation.conll.ConLLToNIFCLI" -Dexec.args="-intype file -f text -i $conllfile.conll -tagset Stts -outfile $conllfile.ttl" 

Due to the use of Jena OntModel, the application is very memory intensive. You may want to use

MAVEN_OPTS="-Xmx4000m -XX:+UseConcMarkSweepGC"

before running the application.

Data

Most data adhering to the format is licensed, so please refer to the respective owners of the data. A major resource available is the German Tiger Corpus that is free for research use.

About

Converter to create NIF files from ConLL format corpora

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages