-
Notifications
You must be signed in to change notification settings - Fork 16
Steps to run Chinese Word Count Module
-
Please use the link below to download the library necessary for the chinese wordcount: https://nlp.stanford.edu/software/stanford-segmenter-2017-06-09.zip
-
Unzip it and find the file "ctb.gz" in the data folder "/stanford-segmenter-2017-06-09/data".

-
Upload the input.txt file and dictionary file with UTF-8 encoding in the TACIT word count module.
Tip : (Windows)Open the file in Notepad. Click 'Save As...'. In the 'Encoding:' combo box you will see the current file format. If its is not UTF-8, save as UTF-8. (Mac) Use TextEditor for the input.
-
When the wordcount runs for the first time with chinese text it will prompt for the chinese dictionary. Add that file from step 2 to tacit the first time, it will not be required again.



-
You can now check the output files after TACIT word count finishes.