Word File Parser

A Python library for parsing Word documents (.docx) and splitting them into sections using the unstructured library.

Features

Parse Word documents into sections based on headings
Extract section content and metadata
Save sections to individual text files
Support for nested sections with heading levels
Robust handling of document structure

Installation

Clone the repository:

git clone [email protected]:project-delphi/word_file_parser.git
cd word_file_parser

Create and activate a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install the package in development mode:

pip install -e .

Usage

from word_file_parser import DocxParser

# Initialize parser with a Word document
parser = DocxParser("path/to/document.docx")

# Parse sections
sections = parser.parse_sections()

# Get a specific section
introduction = parser.get_section("Introduction")

# Save all sections to files
parser.save_sections_to_files("output_directory")

# Save a specific section
parser.save_section("Methods", "output_directory")

Development

Running Tests

python -m pytest tests/

Code Style

This project uses:

black for code formatting
isort for import sorting
flake8 for linting

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
src/word_file_parser		src/word_file_parser
tests		tests
.gitignore		.gitignore
README.md		README.md
example.py		example.py
requirements.txt		requirements.txt
setup.py		setup.py
test_docx_parser.py		test_docx_parser.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Word File Parser

Features

Installation

Usage

Development

Running Tests

Code Style

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

project-delphi/word_file_parser

Folders and files

Latest commit

History

Repository files navigation

Word File Parser

Features

Installation

Usage

Development

Running Tests

Code Style

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages