Architecture Guide

This repository illustrates an end-to-end proof-of-concept scenario that demonstrates how to use Azure Databricks and Azure Kubernetes Service to develop an MLOps platform for online inference workloads. This solution can manage the end-to-end machine learning life cycle and incorporates important MLOps principles when developing, deploying, and monitoring machine learning models at scale.

This approach can easily be extended to address batch inference workloads and incorporate other useful services when managing APIs at scale such as Azure API Management.

Potential use cases

This approach is best suited for:

Teams that have standardized on Databricks for data engineering or machine learning applications.
Teams that have experience deploying and managing Kubernetes workloads with a preference to apply these skills for operationalizing machine learning workloads.
Workloads that require low latency and interactive model predictions are best suited for real-time model inference.

Architecture

A holistic high-level architecture for an MLOps Platform based on the approach outlined in this repository is as follows.

At a high level, this solution design addresses each stage of the machine learning lifecycle:

Data Preparation: this includes sourcing, cleaning, and transforming the data for processing and analysis. Data can live in a data lake or data warehouse and be stored in a feature store after it's curated.
Model Development: this includes core components of the model development process such as experiment tracking and model registration using MLFlow.
Model Deployment: this includes implementing a CI/CD pipeline to containerize machine learning models as API services. These services will be deployed to Azure Kubernetes clusters for end-users to consume.
Model Monitoring: this includes monitoring the API performance and model data drift by analyzing log telemetry with Azure Monitor.

NOTE:

The proof-of-concept that is focused on in this repository and documented in the implementation guide only addresses online (or real-time) inference workloads depicted in the above high-level design. Batch inference workloads are not covered as part of this repository.

Components

The following components are used as part of this design:

Azure Databricks: easy and collaborative Apache Spark-based big data analytics service designed for data science and data engineering.
Azure Kubernetes Service: simplified deployment and management of Kubernetes by offloading the operational overhead to Azure.
Azure Container Registry: managed and private Docker registry service based on the open-source Docker.
Azure Data Lake Gen 2: scalable solution optimized for storing massive amounts of unstructured data.
Azure Monitor: a comprehensive solution for collecting, analyzing, and acting on telemetry from your workloads.
MLflow: open-source solution integrated within Databricks for managing the end-to-end machine learning life cycle.
Azure API Management: a fully managed service that enables customers to publish, secure, transform, maintain, and monitor APIs.
Azure Application Gateway: a web traffic load balancer that enables you to manage traffic to your web applications.
Azure DevOps or GitHub: solutions for implementing DevOps practices to enforce automation and compliance with your workload development and deployment pipelines.

NOTE:

When implementing a CI/CD pipeline different tools such as Azure DevOps Pipelines or GitHub Actions can be used.

The services covered in this design are only a subset of a much larger family of Azure services.

Specific business requirements for your analytics use case could require the use of different services or features that are not considered in this design.

Considerations

Before implementing this solution some factors you might want to consider, include:

This solution is designed for teams who require a high degree of customization and have extensive expertise deploying and managing Kubernetes workloads. If your data science team does not have this expertise consider deploying models to another service like Azure Machine Learning.
The Machine Learning DevOps Guide presents best practices and learnings on adopting ML operations (MLOps) in the enterprise with Machine Learning.
Follow the recommendations and guidelines defined in the Azure Well-Architected Framework to improve the quality of your Azure solutions.
When implementing a CI/CD pipeline different tools such as Azure Pipelines or GitHub Actions can be used.
Specific business requirements for your analytics use case could require the use of different services or features that are not considered in this design.

Pricing

All services deployed in this solution use a consumption-based pricing model. The Azure pricing calculator can be used to estimate costs for a specific scenario. For other considerations, see Cost Optimization in the Well-Architected Framework.

Related resources

You may also find these Architecture Center articles useful:

Machine Learning Operations maturity model
Team Data Science Process for data scientists
Modern analytics architecture with Azure Databricks
Building A Clinical Data Drift Monitoring System With Azure DevOps, Azure Databricks, And MLflow

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

architecture-guide.md

architecture-guide.md

Architecture Guide

Potential use cases

Architecture

Components

Considerations

Pricing

Related resources

Files

architecture-guide.md

Latest commit

History

architecture-guide.md

File metadata and controls

Architecture Guide

Potential use cases

Architecture

Components

Considerations

Pricing

Related resources