This project employs machine learning and natural language processing (NLP) to analyze classic literary works and predict the author of a given phrase. By examining textual patterns and stylistic nuances, the model learns to attribute authorship with notable accuracy.
- Objective: Develop a model that can predict the author of a text snippet from classic literature.
- Techniques Used:
- Text vectorization and tokenization
- Sequential modeling with LSTM (Long Short-Term Memory) networks
- Tools & Libraries:
- Python
- TensorFlow & Keras
- Pandas & NumPy
Text_Author.csv
: Dataset containing text excerpts and corresponding author labels.text-analysis-detect-author-seq-lstm.py
: Python script for data preprocessing, model training, and evaluation.README.md
: Project documentation.LICENSE
: MIT License.
Ensure you have the following installed:
- Python 3.x
- pip (Python package installer)
-
Clone the repository:
git clone https://github.com/markiskorova/Machine-Learning-NLP-Predict-Author.git cd Machine-Learning-NLP-Predict-Author
-
Create and activate a virtual environment:
python -m venv venv source venv/bin/activate # On Windows use `venv\Scripts\activate`
-
Install required packages:
pip install tensorflow pandas numpy
Execute the script to train and evaluate the model:
python text-analysis-detect-author-seq-lstm.py
The script will process the data, train the LSTM model, and output evaluation metrics.
- Source: Curated collection of classic literary texts.
- Format: CSV file with two columns:
text
: Excerpt from a literary work.author
: Name of the author.
- Embedding Layer: Converts words into vector representations.
- LSTM Layer: Captures sequential dependencies in the text.
- Dense Output Layer: Outputs probabilities for each author class.
- Accuracy: Measures the proportion of correct predictions.
- Loss: Evaluates the model's prediction error.
- Incorporate more diverse literary works to improve model generalization.
- Experiment with advanced architectures like Bidirectional LSTMs or Transformers.
- Implement a user interface for interactive author prediction.
This project is licensed under the MIT License. See the LICENSE file for details.
Contributions are welcome! Please fork the repository and submit a pull request for any enhancements or bug fixes.
For questions or suggestions, feel free to open an issue or contact the repository maintainer.