🔗 Sign up here (optional): https://lu.ma/quyfn4q8
Welcome to this hands-on workshop, where you'll learn to build efficient and scalable data ingestion pipelines.
In this workshop, you’ll learn the core skills required to build and manage data pipelines:
- How to build robust, scalable, and self-maintaining pipelines.
- Best practices, like built-in data governance, for ensuring clean and reliable data flows.
- Incremental loading techniques to refresh data quickly and cost-effectively.
- How to build a Data Lake with dlt.
By the end of this workshop, you'll be able to build data pipelines like a senior data engineer — quickly, concisely, and with best practices baked in.
🎥Watch the workshop video: TBA
- Workshop content
- Workshop Colab Notebook
- Homework starter Colab Notebook
- 🌐 Website & Community: Visit our docs, and join discussions in our Slack.
- 💬 Join our Slack Community.
This workshop is structured into three key parts:
1️⃣ Extracting Data – Learn scalable data extraction techniques.
2️⃣ Normalizing Data – Clean and structure data before loading.
3️⃣ Loading & Incremental Updates – Efficiently load and update data.
📌 Find the full course file here: Course File
Welcome to the DataTalks.Club Data Engineering Zoomcamp the data ingestion workshop!
I'm Violetta Mishechkina, Solutions Engineer at dltHub. 👋
- I’ve been working in the data field since 2018, with a background in machine learning.
- I started as a Data Scientist, training ML models and neural networks.
- Over time, I realized that in production, hitting the highest RMSE isn’t as important as model size, infrastructure, and data quality - so I transitioned into MLOps.
- A year ago, I joined dltHub’s Customer Success team and discovered dlt, a Python library that automates 90% of tedious data engineering tasks.
- Now, I work closely with customers and partners to help them integrate and optimize dlt in production.
- I also collaborate with our development team as the voice of the customer, ensuring our product meets real-world data engineering needs.
- My experience across ML, MLOps, and data engineering gives me a practical, hands-on perspective on solving data challenges.
TBA
As you are learning the various concepts of data engineering, consider creating a portfolio project that will further your own knowledge.
By demonstrating the ability to deliver end to end, you will have an easier time finding your first role. This will help regardless of whether your hiring manager reviews your project, largely because you will have a better understanding and will be able to talk the talk.
Here are some example projects that others did with dlt:
- Serverless dlt-dbt on cloud functions: Article
- Bird finder: Part 1, Part 2
- Event ingestion on GCP: Article and repo
- Event ingestion on AWS: Article and repo
- Or see one of the many demos created by our working students: Hacker news, GA4 events, an E-Commerce, Google Sheets, Motherduck, MongoDB + Holistics, Deepnote, Prefect, PowerBI vs GoodData vs Metabase, Dagster, Ingesting events via gcp webhooks, SAP to snowflake replication, Read emails and send sumamry to slack with AI and Kestra, Mode +dlt capabilities, dbt on cloud functions
- If you want to use dlt in your project, check this list of public APIs
If you create a personal project, consider submitting it to our blog - we will be happy to showcase it. Just drop us a line in the dlt Slack.
⭐ Give us a GitHub Star!
💬 Join our Slack Community!
🚀 Let’s build great data pipelines together!