-
Notifications
You must be signed in to change notification settings - Fork 31
Getting Started
First, build Extract. We recommend copying the extract.jar
file to a location that makes it available system-wide and installing a small wrapping script that allows you to execute it from anywhere.
You'll need to set JAVA_OPTS
before getting Extract to run. It will pass whatever is in this environment variable to the JVM. At a minimum, you should set the amount of memory that will be made available to it. For example:
echo "export JAVA_OPTS=\"-Xms512m -Xmx1024m\"" >> ~/.bashrc
source ~/.bashrc
From then on, Extract will have up to 1GB of memory available to it.
Run extract -h
to view a list or available commands and extract -h [command]
for help on a particular command.
Remember that text will not be extracted from images (including those embedded in PDFs) unless you have Tesseract installed.
There are many ways to use Extract, in a distributed, parallel processing setup or with a single instance. See our Workflows page.