Skip to content

Latest commit

 

History

History
36 lines (29 loc) · 1.33 KB

README.md

File metadata and controls

36 lines (29 loc) · 1.33 KB

Image-Text-Summarizer

It mainly comes in use when the reader is reading Novels, Stories or anything which contains large set of paragraphs. The reader can take an image of the paragraph and input it to the model. And the model will result in a summary of that paragraph. Basically, the model makes reading easy and time saving for the readers.

Technologies Used

  • Optical Text Recognition
  • Natural Language preprocessing

Make a file ocr.py in the project folder.

Setup Virual Environment

$ virtualenv venv --python=python3.6

$ source venv/bin/activate

Install dependencies

  • Pillow pip3 install Pillow
  • Pytesseract pip3 install pytesseract
  • OpenCV pip3 install opencv-python
  • NLTK pip3 install nltk

Run ocr.py

python3 ocr.py --image images/story1.jpg > story.txt

story.txt

This file contains all the text from the image story1.jpg using OCR with pytesseract.

Make a new file summarize.py

summarize.py

In this file we used python's NLTK for removing stop_words, puctuations. And also word & sentence tokenizers from the NLTK library.

Run summarize.py

python3 summarize.py story.txt > summary.txt

summary.txt

This file contains the summary of the the text file story.txt.