Hapax Analysis

This project performs analysis on text files in a specified directory, extracting hapax legomena (words that appear only once) and saving the results in an Excel file.

Description

The project reads all .txt files from a specified directory, processes the text by tokenizing it, optionally lemmatizing words, and calculates the frequency of each word. Only words that appear once (hapax legomena) are written to an Excel file.

Features

Read text files from a specified directory.
Tokenize text and lemmatize (optional).
Count word frequency, filtering out non-hapax words.
Save results in an Excel file, listing hapax legomena.

Installation

Clone this repository.
Make sure Go is installed. If not, install Go from the official website: https://golang.org/dl/

Install the required Go packages:

go get github.com/aaaton/golem/v4
go get github.com/xuri/excelize/v2

Install the required dependencies for your project:
```
go mod tidy
```

Usage

Change the absDirPath in main.go to the path of your text files directory.
Run the main program:
```
go run main.go
```
After running the program, check the output file hapax_list.xlsx for the list of hapax legomena.

License

This project is licensed under the GNU 3.0 License - see the GNU GENERAL PUBLIC LICENSE file for details.

Acknowledgments

This project uses the Golem library for lemmatization.
The output is saved using the excelize library.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.github/workflows		.github/workflows
cmd/main		cmd/main
pkg/utils		pkg/utils
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
go.mod		go.mod
go.sum		go.sum
hapax_list.xlsx		hapax_list.xlsx
report.pdf		report.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Hapax Analysis

Description

Features

Installation

Usage

License

Acknowledgments

About

Uh oh!

Uh oh!

Languages

License

kivanc57/hapax_analysis

Folders and files

Latest commit

History

Repository files navigation

Hapax Analysis

Description

Features

Installation

Usage

License

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages