Transforms a file or string in ConLL09 dependency tree format (http://ufal.mff.cuni.cz/conll2009-st/task-description.html) to the NIF format.
Output will be one NIF file with one nif:Context element for the whole input text.
In addition to the usual NIF API parameters, there is the parameter tagset that can be used to set the tagset used by the corpus. Choose one of the OLiA tagsets implemented here.
You can also set an output file via outfile
mvn exec:java -e -Dexec.mainClass="org.nlp2rdf.implementation.conll.ConLLToNIFCLI" -Dexec.args="-intype file -f text -i $conllfile.conll -tagset Stts -outfile $conllfile.ttl"
Due to the use of Jena OntModel, the application is very memory intensive. You may want to use
MAVEN_OPTS="-Xmx4000m -XX:+UseConcMarkSweepGC"
before running the application.
Most data adhering to the format is licensed, so please refer to the respective owners of the data. A major resource available is the German Tiger Corpus that is free for research use.