Azure Storage + Document Intelligence + Function App + Cosmos DB
Costa Rica
Last updated: 2025-05-16
Important
This example is based on a public network site and is intended for demonstration purposes only
. It showcases how several Azure resources can work together to achieve the desired result. Consider the section below about Important Considerations for Production Environment. Please note that these demos are intended as a guide and are based on my personal experiences. For official guidance, support, or more detailed information, please refer to Microsoft's official documentation or contact Microsoft directly
: Microsoft Sales and Support
How to parse PDFs from an Azure Storage Account, process them using Azure Document Intelligence, and store the results in Cosmos DB for further analysis.
- Upload your PDFs to an Azure Blob Storage container.
- An Azure Function is triggered by the upload, which calls the Azure Document Intelligence API to analyze the PDFs.
- The extracted data is parsed and subsequently stored in a Cosmos DB database, ensuring a seamless and automated workflow from document upload to data storage.
Note
Advantages of Document Intelligence for organizations handling with large volumes of documents:
- Utilizes natural language processing, computer vision, deep learning, and machine learning.
- Handles structured, semi-structured, and unstructured documents.
- Automates the extraction and transformation of data into usable formats like JSON or CSV

List of References (Click to expand)
- Azure AI Document Intelligence documentation
- Get started with the Document Intelligence Sample Labeling tool
- Document Intelligence Sample Labeling tool
- Assign an Azure role for access to blob data
- Build and train a custom extraction model
- Compose custom models - Document Intelligence
- Deploy the Sample Labeling tool
- Train a custom model using the Sample Labeling tool
- Train models with the sample-labeling tool
- Azure Cosmos DB - Database for the AI Era
- Consistency levels in Azure Cosmos DB
- Azure Cosmos DB SQL API client library for Python
- CosmosClient class documentation
- Cosmos AAD Authentication
- Cosmos python examples
- Use control plane role-based access control with Azure Cosmos DB for NoSQL
- Use data plane role-based access control with Azure Cosmos DB for NoSQL
- Create or update Azure custom roles using Azure CLI
Table of Content (Click to expand)
- Important Considerations for Production Environment
- Overview
- Step 1: Set Up Your Azure Environment
- Step 2: Set Up Azure Blob Storage for PDF Ingestion
- Step 3: Set Up Azure Cosmos DB
- Step 4: Set Up Azure Document Intelligence
- Create Document Intelligence Resource
- Configure Models
- Using Prebuilt Models
- Training Custom Models (optional/if needed)
- Step 5: Set Up Azure Functions for Document Ingestion and Processing
- Step 6: Test the solution
Private Network Configuration
For enhanced security, consider configuring your Azure resources to operate within a private network. This can be achieved using Azure Virtual Network (VNet) to isolate your resources and control inbound and outbound traffic. Implementing private endpoints for services like Azure Blob Storage and Azure Functions can further secure your data by restricting access to your VNet.
Security
Ensure that you implement appropriate security measures when deploying this solution in a production environment. This includes:
- Securing Access: Use Azure Entra ID (formerly known as Azure Active Directory or Azure AD) for authentication and role-based access control (RBAC) to manage permissions.
- Managing Secrets: Store sensitive information such as connection strings and API keys in Azure Key Vault.
- Data Encryption: Enable encryption for data at rest and in transit to protect sensitive information.
Scalability
While this example provides a basic setup, you may need to scale the resources based on your specific requirements. Azure services offer various scaling options to handle increased workloads. Consider using:
- Auto-scaling: Configure auto-scaling for Azure Functions and other services to automatically adjust based on demand.
- Load Balancing: Use Azure Load Balancer or Application Gateway to distribute traffic and ensure high availability.
Cost Management
Monitor and manage the costs associated with your Azure resources. Use Azure Cost Management and Billing to track usage and optimize resource allocation.
Compliance
Ensure that your deployment complies with relevant regulations and standards. Use Azure Policy to enforce compliance and governance policies across your resources.
Disaster Recovery
Implement a disaster recovery plan to ensure business continuity in case of failures. Use Azure Site Recovery and backup solutions to protect your data and applications.
Azure Document Intelligence
, formerly known as Form Recognizer, is a powerful AI service that extracts structured data from documents. Ituses machine learning models to analyze and process various types of documents, such as invoices, receipts, business cards
, and more.
Key Features | Details |
---|---|
Prebuilt Models | - Invoice Model: Extracts fields like invoice ID, date, vendor information, line items, totals, and more. - Receipt Model: Extracts merchant name, transaction date, total amount, and line items. - Business Card Model: Extracts contact information such as name, company, phone number, and email. |
Custom Models | - Training: You can train custom models using labeled data. This involves uploading a set of documents and manually labeling the fields you want to extract. - Model Management: Manage versions of your custom models, retrain them with new data, and evaluate their performance. |
APIs and SDKs | - REST API: Provides endpoints for analyzing documents, managing models, and retrieving results. - SDKs: Available in multiple languages (e.g., Python, C#, JavaScript) to simplify integration into your applications. |
Important
Regarding Networking
, this example will cover Public access configuration
, and system-managed identity
. However, please ensure you review your privacy requirements and adjust network and access settings as necessary for your specific case
.
An Azure
Resource Group
is acontainer that holds related resources for an Azure solution
. It can include all the resources for the solution or only those you want to manage as a group. Typically, resources that share the same lifecycle are added to the same resource group, allowing for easier deployment, updating, and deletion as a unit. Resource groups also store metadata about the resources, and you can apply access control, locks, and tags to them for better management and organization.
- Create an Azure Account: If you don't have one, sign up for an Azure account.
- Create a Resource Group:
An
Azure Storage Account
provides aunique namespace in Azure for your data, allowing you to store and manage various types of data such as blobs, files, queues, and tables
. It serves as the foundation for all Azure Storage services, ensuring high availability, scalability, and security for your data.
-
In the Azure portal, navigate to your Resource Group.
-
Click + Create.
-
Search for
Storage Account
. -
Select the Resource Group you created.
-
Enter a Storage Account name (e.g.,
invoicecontosostorage
). -
Choose the region and performance options, and click
Next
to continue. -
If you need to modify anything related to
Security, Access protocols, Blob Storage Tier
, you can do that in theAdvanced
tab. -
Regarding
Networking
, this example will coverPublic access
configuration. However, please ensure you review your privacy requirements and adjust network and access settings as necessary for your specific case. -
Click Review + create and then Create. Once is done, you'll be able to see it in your Resource Group.
A
Blob Container
is alogical grouping of blobs within an Azure Storage Account, similar to a directory in a file system
. Containers help organize and manage blobs, which can be any type of unstructured data like text or binary data. Each container can store an unlimited number of blobs, and you must create a container before uploading any blobs.
Within the Storage Account, create a Blob Container to store your PDFs.
-
Go to your Storage Account.
-
Under Data storage, select Containers.
-
Click + Container.
-
Enter a name for the container (e.g.,
pdfinvoices
) and set the public access level to Private. -
Click Create.
If you plan to use access keys, please ensure that the setting "Allow storage account key access" is enabled. When this setting is disabled, any requests to the account authorized with Shared Key, including shared access signatures (SAS), will be denied. Click here to learn more

Azure Cosmos DB
is a globally distributed,multi-model database service provided by Microsoft Azure
. It is designed to offer high availability, scalability, and low-latency access to data for modern applications. Unlike traditional relational databases, Cosmos DB is aNoSQL database, meaning it can handle unstructured, semi-structured, and structured data types
.It supports multiple data models, including document, key-value, graph, and column-family, making it versatile for various use cases.
-
In the Azure portal, navigate to your Resource Group.
-
Click + Create.
-
Search for
Cosmos DB
, click onCreate
: -
Choose your desired API type, for this will be using
Azure Cosmos DB for NoSQL
. This option supports a SQL-like query language, which is familiar and powerful for querying and analyzing your invoice data. It also integrates well with various client libraries, making development easier and more flexible. -
Please enter an account name (e.g.,
contosoinvoiceaicosmos
). As with the previously configured resources, we will use thePublic network
for this example. Ensure that you adjust the architecture to include your networking requirements. -
Select the region and other settings.
-
Click Review + create and then Create.
An
Azure Cosmos DB container
is alogical unit
within a Cosmos DB database where data is stored.Containers are schema-agnostic, meaning they can store items with different structures. Each container is automatically partitioned to scale out across multiple servers, providing virtually unlimited throughput and storage
. Containers are the primary scalability unit in Cosmos DB, and they use a partition key to distribute data efficiently across partitions.
-
Go to your Cosmos DB account.
-
Under Data Explorer, click New Database.
-
Enter a database name (e.g.,
ContosoDBDocIntellig
) and click OK. -
Click New Container.
-
Enter a container name (e.g.,
Invoices
) and set the partition key (e.g.,/transactionId
). -
Click OK.
Azure Document Intelligence
offers robust capabilities forextracting structured data from various document types using advanced machine learning models
. Technically, it providesprebuilt models
forcommon documents like invoices, receipts, and business cards, which can quickly extract key information without custom training. For more specific needs
, it allowstraining custom models using labeled data, enabling precise extraction tailored to unique document formats
. The service is accessible viaREST APIs and SDKs
in multiple languages, facilitating seamless integration into applications. It supportskey-value pair extraction
,table recognition
, andtext extraction
, making it a powerful tool for automating data entry, enhancing document management systems, and streamlining business processes.
-
Go to the Azure Portal.
-
Create a New Resource:
-
Configure the Resource:
- Subscription: Select your Azure subscription.
- Resource Group: Choose an existing resource group or create a new one.
- Region: Select the region closest to your location.
- Name: Provide a unique name for your Form Recognizer resource.
- Pricing Tier: Choose the pricing tier that fits your needs (e.g., Standard S0).
-
Review your settings and click
Create
to deploy the resource.
-
Access Form Recognizer Studio:
-
Select Prebuilt Models: Choose the prebuilt model that matches your document type (e.g., "Invoices" for your PDF example).
-
If the service resource for usage and billing is not configured, a window will appear requesting the resource information. In this case, we will use the one we recently created.
-
Analyze Document:
-
Prepare Training Data:
-
Collect a set of sample documents similar to your PDF example.
-
Label the fields you want to extract using the Form Recognizer Labeling Tool. Click here for more information about to use it.
-
-
Upload Training Data: Upload the labeled documents to an Azure Blob Storage container.
-
Grant the necessary role (
Storage Blob Data Reader
) to the Document Intelligence Account for the Storage Account to access the information. Otherwise, you may encounter an error like this:-
For this example we'll be using the system assigned identity to do that. Under
Identy
within yourDocument Intelligence Account
, change the status toOn
, and click onSave
:A system assigned managed identity is restricted to
one per resource and is tied to the lifecycle of this resource
.You can grant permissions to the managed identity by using Azure role-based access control (Azure RBAC). The managed identity is authenticated with Microsoft Entra ID, so you don’t have to store any credentials in code
. -
Go to your
Storage Account
, underAccess Control (IAM)
click on+ Add
, and thenAdd role assigment
: -
Search for
Storage Blob Data Reader
, clickNext
. Then, click onselect members
and search for yourDocument intelligence identity
. Finally click onReview + assign
:
-
-
In the Form Recognizer Studio, select
Custom extraction model
. -
Scroll down, and click on
Create a project
(e.g,pdfinvoiceproject
,Extract information from pdf invoices
): -
Configure the service resource for the project, choose
subscription
,resource group
,Document Intelligence or Cognitive Service Resource
and theapi version
. -
Connect training data source: Provide the information of the Azure Blob Storage account and the folder that contains your training data.
-
You can also
Auto label
if it's required: -
Test the Model:
- Upload a new document to test the custom model.
- Verify that the model correctly extracts the desired fields.
An
Azure Function App
is acontainer for hosting individual Azure Functions
. It provides the execution context for your functions, allowing you to manage, deploy, and scale them together.Each function app can host multiple functions, which are small pieces of code that run in response to various triggers or events, such as HTTP requests, timers, or messages from other Azure services
.
Azure Functions are designed to be lightweight and event-driven, enabling you to build scalable and serverless applications.You only pay for the resources your functions consume while they are running, making it a cost-effective solution for many scenarios
.
-
In the Azure portal, go to your Resource Group.
-
Click + Create.
-
Search for
Function App
, click onCreate
: -
Choose a
hosting option
; for this example, we will useFunctions Premium
. Click here for a quick overview of hosting options: -
Enter a name for the Function App (e.g.,
ContosoFAaiDocIntellig
). -
Choose your runtime stack (e.g.,
.NET
orPython
). -
Select the region and other settings.
-
Select Review + create and then Create. Verify the resources created in your
Resource Group
.
Important
This example is using system-assigned managed identity to assign RBACs (Role-based Access Control).
-
Please assign the
Storage Blob Data Contributor
andStorage File Data SMB Share Contributor
roles to theFunction App
within theStorage Account
related to the runtime (the one created with the function app). -
Assign
Storage Blob Data Reader
to theFunction App
within theStorage Account
that will contains the invoices, clickNext
. Then, click onselect members
and search for yourFunction App
identity. Finally click onReview + assign
: -
Also add
Cosmos DB Operator
,DocumentDB Account Contributor
,Azure AI Administrator
,Cosmos DB Account Reader Role
,Contributor
: -
To assign the
Microsoft.DocumentDB/databaseAccounts/readMetadata
permission, you need to create a custom role in Azure Cosmos DB. This permission is required for accessing metadata in Cosmos DB. Click here to understand more about it.Aspect Data Plane Access Control Plane Access Scope Focuses on data operations
within databases and containers. This includes actions such as reading, writing, and querying data in your databases and containers.Focuses on management operations
at the account level. This includes actions such as creating, deleting, and configuring databases and containers.Roles - Cosmos DB Built-in Data Reader
: Provides read-only access to data within the databases and containers.
-Cosmos DB Built-in Data Contributor
: Allows read and write access to data within the databases and containers.
-Cosmos DB Built-in Data Owner
: Grants full access to manage data within the databases and containers.- Contributor
: Grants full access to manage all Azure resources, including Cosmos DB.
-Owner
: Grants full access to manage all resources, including the ability to assign roles in Azure RBAC.
-Cosmos DB Account Contributor
: Allows management of Cosmos DB accounts, including creating and deleting databases and containers.
-Cosmos DB Account Reader
: Provides read-only access to Cosmos DB account metadata.Permissions - Reading documents
-Writing documents
- Managing data within containers.- Creating or deleting databases and containers
- Configuring settings
- Managing account-level configurations.Authentication Uses Azure Active Directory (AAD) tokens
orresource tokens
for authentication.Uses Azure Active Directory (AAD)
for authentication.
Steps to assing it:
-
Open Azure CLI: Go to the Azure portal and click on the icon for the Azure CLI.
-
List Role Definitions: Run the following command to list all of the role definitions associated with your Azure Cosmos DB for NoSQL account. Review the output and locate the role definition named
Cosmos DB Built-in Data Contributor
.az cosmosdb sql role definition list \ --resource-group "<your-resource-group>" \ --account-name "<your-account-name>"
-
Get Cosmos DB Account ID: Run this command to get the ID of your Cosmos DB account. Record the value of the
id
property as it is required for the next step.az cosmosdb show --resource-group "<your-resource-group>" --name "<your-account-name>" --query "{id:id}"
Example output:
{ "id": "/subscriptions/{subscription-id}/resourceGroups/{resource-group-name}/providers/Microsoft.DocumentDB/databaseAccounts/{cosmos-account-name}" }
-
Assign the Role: Assign the new role using
az cosmosdb sql role assignment create
. Use the previously recorded role definition ID for the--role-definition-id
argument, the unique identifier for your identity for the--principal-id
argument, and your account's ID for the--scope
argument.You can extract the
principal-id
, fromIdentity
of theFunction App
:az cosmosdb sql role assignment create \ --resource-group "<your-resource-group>" \ --account-name "<your-account-name>" \ --role-definition-id "<role-definition-id>" \ --principal-id "<principal-id>" \ --scope "/subscriptions/{subscriptions-id}/resourceGroups/{resource-group-name}/providers/Microsoft.DocumentDB/databaseAccounts/{cosmos-account-name}"
After a few minutes, you will see something like this:
-
Verify Role Assignment: Use
az cosmosdb sql role assignment list
to list all role assignments for your Azure Cosmos DB for NoSQL account. Review the output to ensure your role assignment was created.az cosmosdb sql role assignment list \ --resource-group "<your-resource-group>" \ --account-name "<your-account-name>"
-
Under
Settings
, go toEnvironment variables
. And+ Add
the following variables:-
COSMOS_DB_ENDPOINT
: Your Cosmos DB account endpoint. -
COSMOS_DB_KEY
: Your Cosmos DB account key. -
COSMOS_DB_CONNECTION_STRING
: Your Cosmos DB connection string. -
invoicecontosostorage_STORAGE
: Your Storage Account connection string. -
FORM_RECOGNIZER_ENDPOINT
: For example:https://<your-form-recognizer-endpoint>.cognitiveservices.azure.com/
-
FORM_RECOGNIZER_KEY
: Your Documment Intelligence Key (Form Recognizer). -
FUNCTIONS_EXTENSION_VERSION
: ~4 (Review the existence of this, if not create it) -
FUNCTIONS_NODE_BLOCK_ON_ENTRY_POINT_ERROR
: true (This setting ensures that all entry point errors are visible in your application insights logs). -
Click on
Apply
to save your configuration.
-
-
You need to install VSCode
-
Install python from Microsoft store:
-
Open VSCode, and install some extensions:
python
, andAzure Tools
. -
Click on the
Azure
icon, andsign in
into your account. Allow the extensionAzure Resources
to sign in using Microsoft, it will open a browser window. After doing so, you will be able to see your subscription and resources. -
Under Workspace, click on
Create Function Project
, and choose a path in your local computer to develop your function. -
Choose the language, in this case is
python
: -
Select the model version, for this example let's use
v2
: -
For the python interpreter, let's use the one installed via
Microsoft Store
: -
Choose a template (e.g., Blob trigger) and configure it to trigger on new PDF uploads in your Blob container.
-
Provide a function name, like
BlobTriggerContosoPDFInvoicesDocIntelligence
: -
Next, it will prompt you for the path of the blob container where you expect the function to be triggered after a file is uploaded. In this case is
pdfinvoices
as was previously created. -
Click on
Create new local app settings
, and then choose your subscription. -
Choose
Azure Storage Account for remote storage
, and select one. I'll be using theinvoicecontosostorage
. -
Then click on
Open in the current window
. You will see something like this: -
Now we need to update the function code to extract data from PDFs and store it in Cosmos DB, use this an example:
- PDF Upload: A PDF is uploaded to the Azure Blob Storage container named
pdfinvoices
. - Trigger Azure Function: The upload triggers the Azure Function
BlobTriggerContosoPDFInvoicesDocIntelligence
. - Initialize Clients: Sets up connections to Document Intelligence and Cosmos DB.
- The function initializes the
DocumentAnalysisClient
to interact with Azure Document Intelligence. - It also initializes the
CosmosClient
to interact with Cosmos DB.
- The function initializes the
- Read PDF from Blob Storage: The function reads the PDF content from the Blob Storage into a byte stream.
- Analyze PDF: Uses Document Intelligence to extract data.
- The function calls the
begin_analyze_document
method of theDocumentAnalysisClient
using the prebuilt invoice model to analyze the PDF. - It waits for the analysis to complete and retrieves the results.
- The function calls the
- Extract Data: Structures the extracted data.
- The function extracts relevant fields from the analysis result, such as customer name, email, address, company name, phone, address, and rental details.
- It structures this extracted data into a dictionary (
invoice_data
).
- The function extracts relevant fields from the analysis result, such as customer name, email, address, company name, phone, address, and rental details.
- Save Data to Cosmos DB: Inserts the data into Cosmos DB.
- The function calls
save_invoice_data_to_cosmos
to save the structured data into Cosmos DB. - It ensures the database and container exist, then inserts the extracted data.
- The function calls
- Logging (process and errors): Throughout the process, the function logs various steps and any errors encountered for debugging and monitoring purposes.
-
Update the function_app.py:
Template Blob Trigger Function Code updated Function Code (Click to expand)
import logging import azure.functions as func from azure.ai.formrecognizer import DocumentAnalysisClient from azure.core.credentials import AzureKeyCredential from azure.cosmos import CosmosClient, PartitionKey, exceptions from azure.identity import DefaultAzureCredential import os import uuid app = func.FunctionApp(http_auth_level=func.AuthLevel.FUNCTION) ## DEFINITIONS def initialize_form_recognizer_client(): endpoint = os.getenv("FORM_RECOGNIZER_ENDPOINT") key = os.getenv("FORM_RECOGNIZER_KEY") if not isinstance(key, str): raise ValueError("FORM_RECOGNIZER_KEY must be a string") logging.info(f"Form Recognizer endpoint: {endpoint}") return DocumentAnalysisClient(endpoint=endpoint, credential=AzureKeyCredential(key)) def read_pdf_content(myblob): logging.info(f"Reading PDF content from blob: {myblob.name}") return myblob.read() def analyze_pdf(form_recognizer_client, pdf_bytes): logging.info("Starting PDF analysis.") poller = form_recognizer_client.begin_analyze_document( model_id="prebuilt-invoice", document=pdf_bytes ) logging.info("PDF analysis in progress.") return poller.result() def extract_invoice_data(result): logging.info("Extracting invoice data from analysis result.") invoice_data = { "id": str(uuid.uuid4()), "customer_name": "", "customer_email": "", "customer_address": "", "company_name": "", "company_phone": "", "company_address": "", "rentals": [] } def serialize_field(field): if field: return str(field.value) # Convert to string return "" for document in result.documents: fields = document.fields invoice_data["customer_name"] = serialize_field(fields.get("CustomerName")) invoice_data["customer_email"] = serialize_field(fields.get("CustomerEmail")) invoice_data["customer_address"] = serialize_field(fields.get("CustomerAddress")) invoice_data["company_name"] = serialize_field(fields.get("VendorName")) invoice_data["company_phone"] = serialize_field(fields.get("VendorPhoneNumber")) invoice_data["company_address"] = serialize_field(fields.get("VendorAddress")) items = fields.get("Items").value if fields.get("Items") else [] for item in items: item_value = item.value if item.value else {} rental = { "rental_date": serialize_field(item_value.get("Date")), "title": serialize_field(item_value.get("Description")), "description": serialize_field(item_value.get("Description")), "quantity": serialize_field(item_value.get("Quantity")), "total_price": serialize_field(item_value.get("TotalPrice")) } invoice_data["rentals"].append(rental) logging.info(f"Successfully extracted invoice data: {invoice_data}") return invoice_data def save_invoice_data_to_cosmos(invoice_data): try: endpoint = os.getenv("COSMOS_DB_ENDPOINT") key = os.getenv("COSMOS_DB_KEY") aad_credentials = DefaultAzureCredential() client = CosmosClient(endpoint, credential=aad_credentials, consistency_level='Session') logging.info("Successfully connected to Cosmos DB using AAD default credential") except Exception as e: logging.error(f"Error connecting to Cosmos DB: {e}") return database_name = "ContosoDBDocIntellig" container_name = "Invoices" try: # Check if the database exists # If the database does not exist, create it database = client.create_database_if_not_exists(database_name) logging.info(f"Database '{database_name}' does not exist. Creating it.") except exceptions.CosmosResourceExistsError: # If error get name, keep going database = client.get_database_client(database_name) logging.info(f"Database '{database_name}' already exists.") database.read() logging.info(f"Reading into '{database_name}' DB") try: # Check if the container exists # If the container does not exist, create it container = database.create_container( id=container_name, partition_key=PartitionKey(path="/transactionId"), offer_throughput=400 ) logging.info(f"Container '{container_name}' does not exist. Creating it.") except exceptions.CosmosResourceExistsError: container = database.get_container_client(container_name) logging.info(f"Container '{container_name}' already exists.") except exceptions.CosmosHttpResponseError: raise container.read() logging.info(f"Reading into '{container}' container") try: response = container.upsert_item(invoice_data) logging.info(f"Saved processed invoice data to Cosmos DB: {response}") except Exception as e: logging.error(f"Error inserting item into Cosmos DB: {e}") ## MAIN @app.blob_trigger(arg_name="myblob", path="pdfinvoices/{name}", connection="invoicecontosostorage_STORAGE") def BlobTriggerContosoPDFInvoicesDocIntelligence(myblob: func.InputStream): logging.info(f"Python blob trigger function processed blob\n" f"Name: {myblob.name}\n" f"Blob Size: {myblob.length} bytes") try: form_recognizer_client = initialize_form_recognizer_client() pdf_bytes = read_pdf_content(myblob) logging.info("Successfully read PDF content from blob.") except Exception as e: logging.error(f"Error reading PDF: {e}") return try: result = analyze_pdf(form_recognizer_client, pdf_bytes) logging.info("Successfully analyzed PDF using Document Intelligence.") except Exception as e: logging.error(f"Error analyzing PDF: {e}") return try: invoice_data = extract_invoice_data(result) logging.info(f"Extracted invoice data: {invoice_data}") except Exception as e: logging.error(f"Error extracting invoice data: {e}") return try: save_invoice_data_to_cosmos(invoice_data) logging.info("Successfully saved invoice data to Cosmos DB.") except Exception as e: logging.error(f"Error saving invoice data to Cosmos DB: {e}")
-
Now, let's update the
requirements.txt
:Template requirements.txt
Updated requirements.txt
azure-functions azure-ai-formrecognizer azure-core azure-cosmos==4.3.0 azure-identity==1.7.0
-
Since this function has already been tested, you can deploy your code to the function app in your subscription. If you want to test, you can use run your function locally for testing.
-
Click on the
Azure
icon. -
Under
workspace
, click on theFunction App
icon. -
Click on
Deploy to Azure
.<img width="550" alt="image" src="https://github.com/user-attachments/assets/12405c04-fa43-4f09-817d-f6879fbff035">
-
Select your
subscription
, yourfunction app
, and accept the prompt to overwrite:<img width="550" alt="image" src="https://github.com/user-attachments/assets/1882e777-6ba0-4e18-9d7b-5937204c7217">
-
After completing, you see the status in your terminal:
<img width="550" alt="image" src="https://github.com/user-attachments/assets/aa090cfc-f5b3-4ef2-9c2d-6be4f00b83b8"> <img width="550" alt="image" src="https://github.com/user-attachments/assets/369ecfc7-cc31-403c-a625-bb1f6caa271c">
-
- PDF Upload: A PDF is uploaded to the Azure Blob Storage container named
Important
If you need further assistance with the code, please click here to view all the function code.
Note
Please ensure that all specified roles are assigned to the Function App. The provided example used System assigned
for the Function App to facilitate the role assignment.
Important
Please ensure that the user/system admin responsible for uploading the PDFs to the blob container has the necessary permissions. The error below illustrates what might occur if these roles are missing.
In that case, go to Access Control (IAM)
, click on + Add
, and Add role assignment
:
Search for Storage Blob Data Contributor
, click Next
.
Then, click on select members
and search for your user/systen admin. Finally click on Review + assign
.
Upload sample PDF invoices to the Blob container and verify that data is correctly ingested and stored in Cosmos DB.
-
Click on
Upload
, then selectBrowse for files
and choose your PDF invoices to be stored in the blob container, which will trigger the function app to parse them. -
Check the logs, and traces from your function with
Application Insights
: -
Under
Investigate
, click onPerformance
. Filter by time range, anddrill into the samples
. Sort the results by date (if you have many, like in my case) and click on the last one. -
Click on
View all
: -
Check all the logs, and traces generated. Also review the information parsed:
-
Validate that the information was uploaded to the Cosmos DB. Under
Data Explorer
, check yourDatabase
.