This project investigates a topic in Natural Language Processing (NLP), while practicing AVL tree data structure. In NLP, one often needs to count how many times each particular word occurs in a text.
For language modeling, one often needs to know how many distinct words occur exactly r times in the text. Let N(r) be the number of distinct words that occur exactly r times. This program thus computes N(r)’s efficiently from a given text.
A text file is specified via a command line argument and N(r)′s are printed to the standard output on separate lines.