Skip to content

Machine Learning & Natural Language Processing: Predict the author of literary text snippets. Built with TensorFlow and Keras, this project trains an LSTM model on classic literature to identify writing style and authorship.

License

Notifications You must be signed in to change notification settings

markiskorova/Machine-Learning-NLP-Predict-Author

Repository files navigation

🧠 Machine Learning & NLP: Predicting Authors from Classic Literature

This project employs machine learning and natural language processing (NLP) to analyze classic literary works and predict the author of a given phrase. By examining textual patterns and stylistic nuances, the model learns to attribute authorship with notable accuracy.

📚 Overview

  • Objective: Develop a model that can predict the author of a text snippet from classic literature.
  • Techniques Used:
    • Text vectorization and tokenization
    • Sequential modeling with LSTM (Long Short-Term Memory) networks
  • Tools & Libraries:
    • Python
    • TensorFlow & Keras
    • Pandas & NumPy

📁 Repository Structure

  • Text_Author.csv: Dataset containing text excerpts and corresponding author labels.
  • text-analysis-detect-author-seq-lstm.py: Python script for data preprocessing, model training, and evaluation.
  • README.md: Project documentation.
  • LICENSE: MIT License.

🚀 Getting Started

Prerequisites

Ensure you have the following installed:

  • Python 3.x
  • pip (Python package installer)

Installation

  1. Clone the repository:

    git clone https://github.com/markiskorova/Machine-Learning-NLP-Predict-Author.git
    cd Machine-Learning-NLP-Predict-Author
  2. Create and activate a virtual environment:

    python -m venv venv
    source venv/bin/activate  # On Windows use `venv\Scripts\activate`
  3. Install required packages:

    pip install tensorflow pandas numpy

Running the Model

Execute the script to train and evaluate the model:

python text-analysis-detect-author-seq-lstm.py

The script will process the data, train the LSTM model, and output evaluation metrics.

📊 Dataset Details

  • Source: Curated collection of classic literary texts.
  • Format: CSV file with two columns:
    • text: Excerpt from a literary work.
    • author: Name of the author.

🔍 Model Architecture

  • Embedding Layer: Converts words into vector representations.
  • LSTM Layer: Captures sequential dependencies in the text.
  • Dense Output Layer: Outputs probabilities for each author class.

📈 Evaluation Metrics

  • Accuracy: Measures the proportion of correct predictions.
  • Loss: Evaluates the model's prediction error.

🛠️ Future Enhancements

  • Incorporate more diverse literary works to improve model generalization.
  • Experiment with advanced architectures like Bidirectional LSTMs or Transformers.
  • Implement a user interface for interactive author prediction.

📄 License

This project is licensed under the MIT License. See the LICENSE file for details.

🤝 Contributing

Contributions are welcome! Please fork the repository and submit a pull request for any enhancements or bug fixes.

📬 Contact

For questions or suggestions, feel free to open an issue or contact the repository maintainer.

About

Machine Learning & Natural Language Processing: Predict the author of literary text snippets. Built with TensorFlow and Keras, this project trains an LSTM model on classic literature to identify writing style and authorship.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages