Skip to content

Latest commit



113 lines (83 loc) · 4.22 KB

File metadata and controls

113 lines (83 loc) · 4.22 KB

Running Metaflow with Airflow and MinIO on Minikube

This guide provides instructions on how to set up Metaflow with Airflow and MinIO on Minikube


Ensure the following software is installed on your system:

Step-by-step Instructions

  1. Start Minikube by executing the following command:

    minikube start --cpus 6 --memory 10240
  2. Add the MinIO Helm repository and update it:

    helm repo add minio
    helm repo update
  3. Install MinIO using Helm:

    helm install --set resources.requests.memory=512Mi --set replicas=1 --set persistence.enabled=false --set mode=standalone --set rootUser=rootuser,rootPassword=rootpass123 minio-s3 minio/minio
  4. Set up port forwarding for the MinIO service:

    kubectl port-forward svc/minio-s3 9000 --namespace default

    The MinIO service will now be accessible at http://localhost:9000.

  5. Install metaflow and kubernetes:

    pip install metaflow kubernetes
  6. Create a metaflow bucket named metaflow-test in MinIO using the Python script. The --access-key/--secret-key correspond to the rootUser / rootPassword set in Step 4.

    python --access-key rootuser --secret-key rootpass123 --bucket-name metaflow-test
  7. Create an ngrok tunnel to the port-forwarded MinIO service in a separate terminal window:

    ngrok http 9000
  8. Create a Kubernetes secret for MinIO. This secret will be used by Metaflow Tasks and Metaflow UI running on Kubernetes to access data stored in MinIO:

    kubectl create secret generic minio-secret --from-literal=AWS_ACCESS_KEY_ID=rootuser --from-literal=AWS_SECRET_ACCESS_KEY=rootpass123
  9. Install Metaflow in the default namespace and enable ingress on minikube:

    minikube addons enable ingress
    git clone [email protected]:outerbounds/metaflow-tools.git .mf-tools
    helm upgrade --install metaflow .mf-tools/k8s/helm/metaflow \
    	--timeout 15m0s \
    	--namespace default \
        --set metaflow-ui.uiBackend.metaflowDatastoreSysRootS3=s3://metaflow-test/metaflow \
        --set metaflow-ui.uiBackend.metaflowS3EndpointURL="<NGROK-TUNNEL-URL-COMES-HERE>" \
        --set "metaflow-ui.envFrom[0]" \
        --set metaflow-ui.ingress.className=nginx \
        --set metaflow-ui.ingress.enabled=true
  10. Create a metaflow configuration file under ~/.metaflowconfig/config.json. Ensure you name it config_airflow_minio.json:

        "METAFLOW_DATATOOLS_S3ROOT": "s3://metaflow-test/data",
        "METAFLOW_DEFAULT_METADATA" : "service",
        "METAFLOW_KUBERNETES_SECRETS": "minio-secret",
        "METAFLOW_SERVICE_INTERNAL_URL": "http://metaflow-metaflow-service.default.svc.cluster.local:8080",
  11. Start the Airflow installation in a separate terminal window:

    pip install apache-airflow apache-airflow-providers-cncf-kubernetes
    mkdir ~/airflow && mkdir ~/airflow/dags && airflow standalone
  12. Create the Airflow DAG file for the

    export METAFLOW_PROFILE=airflow_minio
    python airflow create
    cp ~/airflow/dags/
  13. Ensure DAGs get loaded with airflow dags reserialize.

  14. Trigger the DAG from the Airflow UI. The DAG named HelloFlow will appear on the Airflow UI.