Steps to run Chinese Word Count Module

Jump to bottom Edit New page

RAJDEEP KAUR edited this page Feb 4, 2018 · 8 revisions

Please use the link below to download the library necessary for the chinese wordcount: https://nlp.stanford.edu/software/stanford-segmenter-2017-06-09.zip
Unzip it and find the file "ctb.gz" in the data folder "/stanford-segmenter-2017-06-09/data".
Upload the input.txt file and dictionary file with UTF-8 encoding in the TACIT word count module.

Tip : (Windows)Open the file in Notepad. Click 'Save As...'. In the 'Encoding:' combo box you will see the current file format. If its is not UTF-8, save as UTF-8. (Mac) Use TextEditor for the input.
When the wordcount runs for the first time with chinese text it will prompt for the chinese dictionary. Add that file from step 2 to tacit the first time, it will not be required again.
You can now check the output files after TACIT word count finishes.