Warning
This Solutions Accelerator is in early stages and is subjected to changes.
- Overview
- High-Level Architecture
- Architecture
- Components
- Workflows
- Prerequisites
- Programming and Tools Used
- Technologies
- Usage Instructions
Azure Voice Notes is a cloud-based audio processing system that allows users to efficiently process voice recordings by transcribing them into text and generating summarized reports. It provides an end-to-end workflow from audio input to structured reports.
From our research, many organizations reported that their internal resources waste around 70% of their time transcribing and summarizing their audio use cases. Some organizations stated that they cannot process all of their internal audio due to limited capacity and high costs. For example, call centers struggle with the operational headache and expenses of manually transcribing and summarizing conversations.
To address these pain points, this project provides an automated solution to streamline transcription, summarization, and structured report generation. It allows organizations to customize their workflows and significantly reduce time and costs while transforming unstructured audio inputs into structured, measurable data. It transforms unstructured voice inputs into measurable, structured outputs, improving efficiency and accessibility of voice data analysis.
graph TD;
User[User] -->|Records Audio| AudioProcessor[Audio Processor]
AudioProcessor -->|Transcribes & Summarizes| Reports[Measurable Reports and Analytics]
Reports -->|Viewed by| Management[Management]
Reports -->|Used for Decision Making| DecisionMakers[Decision Makers]
Reports -->|Used for Compliance| ComplianceTeams[Compliance Teams]
- Medical Summarization for Appointments π₯: Transcribe and summarize doctor-patient conversations to create structured medical notes.
- Social Workers Summarization for Appointments π : Convert case discussions into structured reports for better case tracking and management.
- Call Center QA for Summarization and Analysis π: Analyze customer support interactions, extract insights, and generate quality assurance reports.
- Legal Documentation βοΈ:: Transcribe legal proceedings and meetings into structured, searchable documents.
- Academic Research and Interviews π: Automatically convert research interviews into summarized reports for easier analysis.
- Business Meetings & Conference Calls πΌ: Generate structured summaries from meeting recordings to improve collaboration and documentation.
This system leverages various Microsoft Azure tools for processing and storage, including:
- Azure Static Web Apps: Provides a web interface for user interaction.
- Azure App Service: Handles backend logic for user management, file handling, and workflow execution.
- Azure Blob Storage: Stores audio recordings, transcriptions, and reports.
- Azure Functions: Processes voice recordings asynchronously, handling transcription and summarization tasks.
- Azure Speech-to-Text API: Converts audio into structured text.
- Azure OpenAI GPT-4o: Summarizes transcriptions and refines text output.
- CosmosDB (Document, Serverless): Manages metadata, logs, and user records.
Note
The architecture separates components and follows an asynchronous model to ensure scalability and a balanced cost structure. It ensures seamless integration between various Azure components, facilitating automated transcription, summarization, and report generation. The system leverages Microsoft Azure services, including Azure Static Web, Azure App Service, Azure Cosmos DB, Azure Functions, Blob Storage, Speech-to-Text, and OpenAI GPT-4o for summarization.
graph TD;
User[User] -->|Uploads Voice Recording| WebApp[Azure Static Web]
WebApp -->|Sends File| AppService[Azure App Service]
AppService -->|Stores Recording| BlobStorage[Azure Blob Storage]
BlobStorage -->|Triggers Function| Function[Azure Function]
Function -->|Submits Transcription| SpeechToText[Azure AI Speech-to-Text]
SpeechToText -->|Returns Transcribed Text| Function
Function -->|Stores Transcription| TranscribedBlob[Azure Blob Storage - Transcribed]
Function -->|Retrieves Prompt| CosmosDB[CosmosDB - Document, Serverless ]
Function -->|Processes Summarization| OpenAIGPT[Azure OpenAI GPT-4o]
OpenAIGPT -->|Returns Summary| Function
Function -->|Stores Summary| SummaryBlob[Azure Blob Storage - Results]
Function -->|Updates Status| CosmosDB
AppService -->|Fetches Processed Data| CosmosDB
WebApp -->|Fetches Transcription and Analysis Report| AppService
User -->|Views Report & Listens to Recordings| WebApp
The system consists of multiple components working together:
- Azure Static Web - This is a React app that provides a web interface where end users can upload voice recordings, view transcripts, and access reports. The UI allows the user to view the transcription and PDF analysis report by retrieving this information from the database. It allows the user to add prompts and customize the prompt for the report summarization. It also includes login and register functionality and interacts with Azure App Service as a backend.
- Azure App Service - This is the backend service written using FastAPI and mainly interacts with CosmosDB, the frontend, and Azure Blob Storage to upload files, register users, login, and retrieve data from the database.
- Azure Blob Storage (Recordings) - Stores uploaded voice recordings.
- Azure Functions - This function is written in Python and triggers based on Azure Blob Storage triggers for every new recording upload. It runs asynchronously for each new blob recording upload. The function handles transcription, uploads transcriptions, updates the database, and summarizes the text using GPT-4o while fetching prompts from the database.
- Azure Speech-to-Text - Converts voice recordings into text.
- Azure OpenAI GPT-4o - Summarizes the transcribed text.
- Azure Blob Storage (Results) - Stores processed transcripts and summarized reports.
- CosmosDB (Document, Serverless) - Stores metadata, logs, and user activity for analytics and tracking.
-
User Uploads Recording: The end user uploads a voice recording via the Azure Static Web interface.
-
File Storage & Processing Trigger: The Azure App Service stores the recording in Azure Blob Storage (Recordings), triggering an Azure Function.
-
Azure Functions: Transcription and Summarization Processing:
- Getting file from Azure Blob from the trigger
- Transcription Process:
- The function submits the transcription job to Azure AI Speech "Speech to Text" and updates the database with the status 'transcribing'.
- It waits and checks the transcription status until completion.
- If successful, the transcribed text file is uploaded to Azure Blob Storage (transcribe.txt).
- The database is updated with the status 'transcribed', and the Blob URL is stored.
- Prompt Retrieval & Summarization:
- The function retrieves the relevant prompt based on the job category and sub-category ID from the database.
- The transcribed text is sent to Azure OpenAI GPT-4o, which generates a concise summary.
- Report Storage & Completion Update:
- The summarized report is stored in Azure Blob Storage (Results).
- The function updates the database with the status 'completed' and stores the analysis report file Blob URL.
-
Report Retrieval:
- The Azure App Service retrieves the processed data from the database.
- The UI fetches and displays the transcription and analysis report.
- The user can listen to the recordings and view job details.
- The user can download the final report.
-
Logging & Analytics: Logging and metadata are stored in CosmosDB (Document, Serverless) for tracking and analytics.
sequenceDiagram
participant User
participant WebApp as Azure Static Web
participant AppService as Azure App Service
participant Blob as Azure Blob Storage
participant Function as Azure Function
participant Speech as Azure AI Speech-to-Text
participant AI as Azure OpenAI GPT-4o
participant Cosmos as CosmosDB
User->>WebApp: Upload Voice Recording
WebApp->>AppService: Send File for Processing
AppService->>Blob: Store File in Recordings
Blob->>Function: Trigger Processing
Function->>Speech: Submit Transcription Job
Speech-->>Function: Return Transcribed Text
Function->>Blob: Store Transcription File
Function->>Cosmos: Update Status 'Transcribed'
Function->>Cosmos: Retrieve Prompt
Function->>AI: Summarize Transcription
AI-->>Function: Return Summary
Function->>Blob: Store Summary File
Function->>Cosmos: Update Status 'Completed'
AppService->>Cosmos: Fetch Report Details
AppService->>WebApp: Provide Download Link
WebApp->>User: Report Available for Download
To deploy and run this project, you need:
- Microsoft Azure Account with access to the following services:
- Azure Static Web Apps
- Azure App Service
- Azure Blob Storage
- Azure Functions
- Azure Speech-to-Text API
- Azure OpenAI GPT-4o API
- CosmosDB (Document, Serverless)
- Node.js (for frontend development, if applicable)
- Python (backend implementation)
- Create an Azure Static Web App and deploy the frontend.
- Set up an Azure App Service to handle voice file uploads.
- Configure Azure Blob Storage with two containers:
recordings
andresults
. - Deploy an Azure Function with a Blob trigger to process new files.
- Enable Azure Speech-to-Text API for transcription.
- Enable Azure OpenAI GPT-4o API for text summarization.
- Create a CosmosDB (Document, Serverless) instance to store logs.
- Set up a function with a Blob trigger that listens for new recordings.
- Integrate Azure Speech-to-Text and OpenAI GPT-4o APIs.
- Store processed data in Blob Storage (Results).
- Host the frontend on Azure Static Web Apps.
- Connect the web app to the backend Azure App Service.
- Implement authentication (if required).
//insert demo here
- Frontend: Azure Static Web Apps
- Backend: Azure App Service, Azure Functions
- Storage: Azure Blob Storage
- AI Processing:
- Azure Speech-to-Text
- Azure OpenAI GPT-4o
- Database: CosmosDB (Document, Serverless)
We appreciate the efforts and contributions of the following individuals:
Name | |
---|---|
Moustafa Mahmoud | MoustafaAMahmoud |
Wolfgang Knupp | WolfgangKnupp |
Arthur Zrtur Zielinski | ArturZielinski |
If youβve contributed and would like your contact info added, feel free to submit a PR! π