This project performs analysis on text files in a specified directory, extracting hapax legomena (words that appear only once) and saving the results in an Excel file.
The project reads all .txt files from a specified directory, processes the text by tokenizing it, optionally lemmatizing words, and calculates the frequency of each word. Only words that appear once (hapax legomena) are written to an Excel file.
- Read text files from a specified directory.
 - Tokenize text and lemmatize (optional).
 - Count word frequency, filtering out non-hapax words.
 - Save results in an Excel file, listing hapax legomena.
 
- Clone this repository.
 - Make sure Go is installed. If not, install Go from the official website: https://golang.org/dl/
 - Install the required Go packages:
go get github.com/aaaton/golem/v4 go get github.com/xuri/excelize/v2
 - Install the required dependencies for your project:
go mod tidy
 
- 
Change the
absDirPathinmain.goto the path of your text files directory. - 
Run the main program:
go run main.go
 - 
After running the program, check the output file
hapax_list.xlsxfor the list of hapax legomena. 
This project is licensed under the GNU 3.0 License - see the GNU GENERAL PUBLIC LICENSE file for details.
- This project uses the Golem library for lemmatization.
 - The output is saved using the excelize library.