Skip to content

Commit de62daf

Browse files
Changes
1 parent d45c718 commit de62daf

File tree

2 files changed

+4
-2
lines changed

2 files changed

+4
-2
lines changed

ScrapePDF/Readme.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# Script to scrape pdf
22

33
## Overview:
4-
- A beginner friendly script to scrape pdf. You can easily get document info sunch as creator , crceation_date and no. of pages. Extract as many pages as you want.
4+
- A beginner friendly script to scrape pdf. You can easily get document info sunch as creator , creation_date and no. of pages. Extract as many pages as you want.
55

66

77
### Installing required libraries

ScrapePDF/pdfscrapper.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,8 @@
11
# import PyPDF2 library
22
import PyPDF2 as p2
33

4-
PDFfile = open("File path here.pdf", "rb")
4+
pdffile = input("Enter path to pdf file you want to scrape: \n")
5+
PDFfile = open(pdffile, "rb")
56
pdfread = p2.PdfFileReader(PDFfile)
67

78

@@ -24,6 +25,7 @@
2425

2526

2627
# Extract entire pdf
28+
print("---------ENTIRE PDF----------")
2729
i = 0
2830
while i<pdfread.getNumPages():
2931
pageinfo = pdfread.getPage(i)

0 commit comments

Comments
 (0)