Finetune Controller is a robust and flexible system designed to manage and streamline the fine-tuning of machine learning models on Kubernetes, particularly within OpenShift clusters. This project leverages modern tools and workflows, enabling efficient development and deployment processes for AI-driven applications.
- Local Development: Get started quickly with a streamlined setup process using uv, a high-performance Python package and project manager.
- OpenShift Integration: Simplify deployment and scaling with OpenShift-specific configurations and GPU support for intensive workloads.
- MongoDB Backend: Seamlessly connect to a local or cluster-based MongoDB database.
- Extensibility: Easily integrate with the Kubeflow Training Operator and other components for advanced workflows.
If the cluster is already set up continue else follow the cluster setup instructions here
-
Recommend using uv, an extremely fast Python package and project manager
pip install uv
-
A container engine such as Docker or Podman
-
Create virtual environment and install dependencies
uv sync
-
Start a local developement mongo database (or connect to one on cluster with port-forward)
Local
docker run -d --rm --name mongodb \ -e MONGODB_INITDB_ROOT_USERNAME="default-user" \ -e MONGODB_INITDB_ROOT_PASSWORD="admin123456789" \ -e MONGODB_INITDB_DATABASE="finetune" \ -p 27017:27017 \ mongodb/mongodb-community-server:latest
you can port-forward this connection to your local machine
oc port-forward service/mongodb-community-server 27017:27017 -n <namespace>
-
Connect to the Openshift cluster with the cli login command
oc login
. If cluster not already set up follow these steps -
Create a project level
.env
file (see.env.example
) and update the variables.cp .env.example .env
-
Make sure the virtual environment is activated and start the local finetuning controller application.
source .venv/bin/activate uvicorn app.main:app --reload
This will:
- Start MongoDB with the required configuration
- Build and start the FastAPI server
- Make the application available at http://localhost:8000
Setup pre-commit to keep linting and code styling up to standard.
uv sync
pre-commit install
Name can be descriptive for these examples we will use finetune-controller
oc new-project finetune-controller
oc new-project kubeflow
kubectl apply --server-side -k "github.com/kubeflow/training-operator.git/manifests/overlays/standalone?ref=v1.8.1"
Requires Kubernetes 1.29 or newer
Follow the latest docs
Install a released version
kubectl apply --server-side -f https://github.com/kubernetes-sigs/kueue/releases/download/v0.10.1/manifests.yaml
To wait for Kueue to be fully available, run:
kubectl wait deploy/kueue-controller-manager -nkueue-system --for=condition=available --timeout=5m
Restart pods
kubectl delete pods -lcontrol-plane=controller-manager -nkueue-system
First update the namepspace for the crd LocalQueue object in default-user-queue.yaml. default namepsace: "default"
yq e '.metadata.namespace = "finetune-controller"' -i crds/kueue/default-user-queue.yaml
Apply the default CRD config for Kueue or update by following their docs
kubectl apply -f crds/kueue/
Example configuration. do properly configure for production
oc new-app -e MONGODB_INITDB_ROOT_USERNAME="default-user" -e MONGODB_INITDB_ROOT_PASSWORD="admin123456789" -e MONGODB_INITDB_DATABASE="finetune" mongodb/mongodb-community-server:latest --namespace finetune-controller
Go to your cluster on redhat console admin dashboard. Add a machine pool of your choosing with the following configuration:
Taints
key: nvidia.com/gpu
value: <machine pool type or other>
effect: NoSchedule
Node Labels
Key: cluster-api/accelerator
Value: <gpu type e.g. V100 or empty>
Example aws config
# aws_credentials.yaml
apiVersion: v1
data:
AWS_ACCESS_KEY_ID: |base64 encoded secret
AWS_SECRET_ACCESS_KEY: |base64 encoded secret
AWS_REGION: |base64 encoded string
kind: Secret
metadata:
name: aws-credentials
type: Opaque
Example docker pull secret config
# pull_secret.yaml
apiVersion: v1
data:
.dockerconfigjson: ...
kind: Secret
metadata:
name: cr-pull-secret
type: kubernetes.io/dockerconfigjson
Apply these secrets
oc apply -f aws-credentials.yaml -n finetune-controller
-
Create a
.env.production
file and update the defaults. For this example setMONGODB_URL=mongodb://mongodb-community-server.finetune-controller.svc.cluster.local:27017
cp .env.example .env.production
-
create the application
oc new-app --strategy=docker --binary --name finetune-controller --env-file=".env.production" --namespace finetune-controller
-
expose services and patch tls config
oc expose deployment/finetune-controller --port=8000 oc expose svc/finetune-controller --port=8000 oc patch route finetune-controller --type=merge -p '{"spec":{"tls":{"termination":"edge"}}}'
-
add cluster role binding permissions to the application
-
start a build
oc start-build finetune-controller --from-dir=. --namespace=finetune-controller
Publish From current project
./scripts/publish.sh
Publish From git ~HEAD
./scripts/publish_git.sh