Completely trains an XGBoost model for Customer Frequency prediction using Kubernetes
- 📝 Table of Contents
- 🧐 About
- 🏁 Getting Started
- 🔧 Running the trainging Pipeline
- 🎈 Monitoring
- 🚀 Deployment
- ⛏️ Built Using
- ✍️ Authors
Components in the pipeline include:
- Data prerocessing
- Data Splitting
- Hyperparameter Optimisation and Cross Validation
- Model Saving (Local or Cloud)
- Model testing on new data
You will need k8s set up with kubeflow before doing anything.
Make sure you have poetry installed, if you use arch you can install it by running:
yay -S python-poetry
This is how I setup Kubeflow with MiniKube
I have kubectl aliased to minikube kubectl --
minikube start --cpus 4 --memory 4000 --disk-size=10g
curl -s "https://raw.githubusercontent.com/kubernetes-sigs/kustomize/master/hack/install_kustomize.sh" | bash
git clone [email protected]:kubeflow/manifests.git
cd manifests
while ! kustomize build example | awk '!/well-defined/' | kubectl apply -f -; do echo "Retrying to apply resources"; sleep 10; done
kubectl get pods -A
# To make sure its working
# Give all the pods some time to get running
kubectl port-forward svc/istio-ingressgateway -n istio-system 8080:80
To install all dependencies, I highly recommend using a virtual environment
poetry install
poetry run python train_pipeline_execute.py --host http://your_kubeflow_host_url
This project uses Aim for monitoring model performance.
Run aim up
in your terminal after the completion of the pipeline.
This will open a dashboard in your browser for monitoring.
It will look something like this
Gradio frontend has been built, can be deployed anywhere using the inference.py
script
- Python
- XGBoost
- Aim - Monitoring and Tracking
- Kubeflow - Distributed computing for training and deployment
- Gradio - API