This application processes invoices and checks using OpenAI's GPT-4o for image recognition and Snowflake for data storage. It extracts relevant details from uploaded documents and stores structured data in a Snowflake database.
- Parses invoices and checks from PDF and image files.
- Uses OpenAI's GPT-4o for extracting structured data.
- Stores extracted data in a Snowflake database.
- Built with Streamlit for an interactive web interface.
- Supports invoice itemization and structured check data extraction.
- Python
- Streamlit
- OpenAI GPT-4o (via LangChain)
- Snowflake
- Fitz (PyMuPDF) for PDF processing
- Pandas for data manipulation
- Clone the repository
- Install dependencies:
pip install -r requirements.txt
- Set up environment variables in
keys.py:OPENAI_API_KEY = "your_openai_api_key" SNOWFLAKE_USER = "your_snowflake_user" SNOWFLAKE_PASSWORD = "your_snowflake_password" SNOWFLAKE_ACCOUNT = "your_snowflake_account" SNOWFLAKE_ROLE = "your_snowflake_role" SNOWFLAKE_WAREHOUSE = "your_snowflake_warehouse" SNOWFLAKE_DATABASE = "your_snowflake_database"
- Run the application:
streamlit run main.py
- Select "Invoice" or "Check" in the Streamlit UI.
- Upload the relevant document (PDF for invoices, image for checks).
- The application extracts structured data and displays it.
- Invoice data is stored in Snowflake for further processing.
- Vendor Name
- Invoice Number
- Customer Number
- Invoice Date
- PO Number & Date
- Items (Quantity, Price, Description, etc.)
- Total Invoice Amount
- Check Number
- Payee Name
- Amount (Numeric & Words)
- Check Date
- Bank Details
- Memo & Payer Information
- Deletes existing records for an uploaded invoice before inserting new data.
- Stores structured invoice data, including itemized details.