(Pronounced: "Do.eks")
π‘ Optimized Blueprints for Running Scalable Data Workloads on Kubernetes with Amazon EKS
Workload Type | Repository | Website |
---|---|---|
π Data on EKS (This Repo) | github.com/awslabs/data-on-eks | awslabs.github.io/data-on-eks |
π€ AI on EKS (AI/ML Blueprints) | github.com/awslabs/ai-on-eks | awslabs.github.io/ai-on-eks |
π§ Use Data on EKS for analytics, batch, stream, workflow, and data platform workloads. Use AI on EKS for model training, inference, GenAI, and ML orchestration.
To better organize and support data and AI/ML workloads independently, we've split the original Data on EKS project into two focused repositories:
- π Data on EKS β Focuses on Data Analytics, ETL, Streaming, Databases, and Query Engines
- π€ AI on EKS β Covers AI/ML, including LLMs, Training/Inference, and Generative AI patterns
π Officially announced at KubeCon EU London (April 2025) π¦ Full migration complete by end of April 2025
All future AI-related contributions should be directed to the new AI on EKS GitHub repository.
Build, Scale, and Optimize Data Platforms on Amazon EKS π
Welcome to Data on EKS, your launchpad for deploying data platforms at scale on Amazon EKS.
Explore practical examples and patterns for running Data workloads on EKS using advanced frameworks such as Apache Spark for distributed data processing, Apache Flink for real-time stream processing, and Apache Kafka for high-throughput distributed messaging. Automate and orchestrate complex workflows with Apache Airflow and leverage the robust capabilities of Amazon EMR on EKS to build resilient clusters, seamlessly integrating Kubernetes with big data solutions for enhanced scalability and performance.
Note: DoEKS is in active development. For upcoming features and enhancements, check out the issues section.
π§ Looking for AI/ML or GenAI solutions on EKS? Check out AI on EKS for patterns with NVIDIA Triton, vLLM, HuggingFace, and more.
The diagram below showcases the wide array of open-source data tools, Kubernetes operators, and frameworks used by DoEKS. It also highlights the seamless integration of AWS Data Analytics managed services with the powerful capabilities of DoEKS open-source tools.
Data on EKS(DoEKS) solution is categorized into the following focus areas.
π― Data Analytics on EKS
π― Streaming Platforms on EKS
π― Scheduler Workflow Platforms on EKS
π― Distributed Databases & Query Engine on EKS
In this repository, you'll find a variety of deployment blueprints for creating Data/ML platforms with Amazon EKS clusters. These examples are just a small selection of the available blueprints - visit the DoEKS website for the complete list of options.
Here are some of the ready-to-deploy blueprints included in this repo:
Blueprint | Description |
---|---|
π EMR-on-EKS with Karpenter | Run EMR Spark workloads on EKS with cost-effective autoscaling |
π Spark Operator with YuniKorn | Self-managed Spark with multi-tenant scheduling |
π Apache Flink Operator | Self-managed Flink clusters on EKS |
π Apache Kafka with Strimzi | High-throughput Kafka messaging on EKS |
π Airflow on EKS | DAG-based data pipeline orchestration using Apache Airflow |
π Argo Workflows | Kubernetes-native workflow engine for CI/CD or data pipelines |
For instructions on how to deploy Data on EKS patterns and run sample tests, visit the DoEKS website.
Kubernetes is a widely adopted system for orchestrating containerized software at scale. As more users migrate their data platforms and workloads to Kubernetes, they often face the complexity of managing the Kubernetes ecosystem and selecting the right tools and configurations for their specific needs.
At AWS, we understand the challenges users encounter when deploying and scaling data workloads on Kubernetes. To simplify the process and enable users to quickly conduct proof-of-concepts and build production-ready clusters, we have developed Data on EKS (DoEKS). DoEKS offers opinionated open-source blueprints that provide end-to-end logging and observability, making it easier for users to deploy and manage Spark on EKS, Airflow, Presto, Kafka and other data workloads. With DoEKS, users can confidently leverage the power of Kubernetes for their data needs without getting overwhelmed by its complexity.
DoEKS is maintained by AWS Solution Architects and is not an AWS service. Support is provided on a best effort basis by the Data on EKS Blueprints community. If you have feedback, feature ideas, or wish to report bugs, please use the Issues section of this GitHub.
See CONTRIBUTING for more information.
This library is licensed under the Apache 2.0 License.
We're building an open-source community focused on Data Engineering, Streaming, and Analytics on Kubernetes.
Come join us and contribute to shaping the future of data platforms on Amazon EKS!
Built with β€οΈ at AWS.