Name	Name	Last commit message	Last commit date
Latest commit pszemraj Documentation & Updates (#5 ) Jan 21, 2023 419eb3b · Jan 21, 2023 History 14 Commits
.github/workflows	.github/workflows	Create python-publish.yml	Jan 18, 2023
src/textsum	src/textsum	Documentation & Updates (#5 )	Jan 21, 2023
.gitignore	.gitignore	min working example (#1 )	Dec 20, 2022
AUTHORS.md	AUTHORS.md	🎉 add pyscaffold skeleton	Dec 18, 2022
CHANGELOG.md	CHANGELOG.md	Summarizer class object (#3 )	Jan 18, 2023
CONTRIBUTING.md	CONTRIBUTING.md	Documentation & Updates (#5 )	Jan 21, 2023
LICENSE	LICENSE	Initial commit	Dec 18, 2022
README.md	README.md	Documentation & Updates (#5 )	Jan 21, 2023
pyproject.toml	pyproject.toml	🎉 add pyscaffold skeleton	Dec 18, 2022
setup.cfg	setup.cfg	Update docs (#4 )	Jan 18, 2023
setup.py	setup.py	🎉 add pyscaffold skeleton	Dec 18, 2022
tox.ini	tox.ini	🎉 add pyscaffold skeleton	Dec 18, 2022

Repository files navigation

textsum

utility for using transformers summarization models on text docs

This package is to provides easy-to-use interfaces for using summarization models on text documents of arbitrary length. Currently implemented interfaces include a python API, CLI, and a shareable demo app.

For details, explanations, and docs, see the wiki

⚠️ This is a WIP, but general functionality is available ⚠️

textsum

Installation

Install using pip:

# create a virtual environment (optional)
pip install textsum

The textsum package is now installed in your virtual environment. CLI commands/python API can be summarize text docs from anywhere. see the Usage section for more details.

Full Installation

To install all the dependencies (includes PDF OCR, gradio UI demo, optimum, etc), run:

git clone https://github.com/pszemraj/textsum.git
cd textsum
# create a virtual environment (optional)
pip install -e .[all]

Additional Details

This package uses the clean-text python package, and like the "base" version of the package does not include the GPL-licensed unidecode dependency. If you want to use the unidecode package, install the package as an extra with pip:

pip install textsum[unidecode]

In practice, text cleaning pre-summarization with/without unidecode should not make a significant difference.

Usage

There are three ways to use this package:

python API
CLI
Demo App

Python API

To use the python API, import the Summarizer class and instantiate it. This will load the default model and parameters.

You can then use the summarize_string method to summarize a long string of text.

from textsum.summarize import Summarizer

summarizer = Summarizer() # loads default model and parameters

# summarize a long string
out_str = summarizer.summarize_string('This is a long string of text that will be summarized.')
print(f'summary: {out_str}')

you can also directly summarize a file:

out_path = summarizer.summarize_file('/path/to/file.txt')
print(f'summary saved to {out_path}')

CLI

To summarize a directory of text files, run the following command:

textsum-dir /path/to/dir

The following options are available:

usage: textsum-dir [-h] [-o OUTPUT_DIR] [-m MODEL_NAME] [-batch BATCH_LENGTH] [-stride BATCH_STRIDE] [-nb NUM_BEAMS]
                   [-l2 LENGTH_PENALTY] [-r2 REPETITION_PENALTY] [--no_cuda] [-length_ratio MAX_LENGTH_RATIO] [-ml MIN_LENGTH]
                   [-enc_ngram ENCODER_NO_REPEAT_NGRAM_SIZE] [-dec_ngram NO_REPEAT_NGRAM_SIZE] [--no_early_stopping] [--shuffle]
                   [--lowercase] [-v] [-vv] [-lf LOGFILE]
                   input_dir

For more information, run:

textsum-dir --help

Demo App

For convenience, a UI demo¹ is provided using gradio. To ensure you have the dependencies installed, clone the repo and run the following command:

pip install textsum[app]

To run the demo, run the following command:

textsum-ui

This will start a local server that you can access in your browser & a shareable link will be printed to the console.

Contributing

Contributions are welcome! Please open an issue or PR if you have any ideas or suggestions.

See the CONTRIBUTING.md file for details on how to contribute.

Roadmap

add CLI for summarization of all text files in a directory
python API for summarization of text docs
add argparse CLI for UI demo
put on pypi
optimum inference integration, LLM.int8 inference
better documentation in the wiki, details on improving performance (speed, quality, memory usage, etc.)
improvements to OCR helper module

Other ideas? Open an issue or PR!

The demo is currently minimal, but will be expanded in the future to accept other arguments and options. ↩

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

textsum

Installation

Full Installation

Additional Details

Usage

Python API

CLI

Demo App

Contributing

Roadmap

About

Releases 10

Languages

License

pszemraj/textsum

Folders and files

Latest commit

History

Repository files navigation

textsum

Installation

Full Installation

Additional Details

Usage

Python API

CLI

Demo App

Contributing

Roadmap

Footnotes

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 10

Languages