GitHub - kaisadadi/n-gram-adding-one-and-good-turing-: for NLP lesson assignment

这是NLP课程中的一次作业，完成了基于adding-one及good-turing的uni-gram和bi-gram算法。

数据集来源为北京大学语料库，详见dataset文件夹（由于上传限制，除testB外，请通过zip解压获取全部数据），其中testB是用于报告中选取一句话进行分析而人工合成。

为了适应多种方式的数据读取，设立dataset类，mode/word/type的含义详见report。

对实验结果的分析，见report，report中的测试指标为perplexity，即困惑度

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
adding_one		adding_one
dataset		dataset
good turing		good turing
LICENSE		LICENSE
n-gram实验报告.pdf		n-gram实验报告.pdf
readme.md		readme.md
北大语料库.zip		北大语料库.zip

Provide feedback