Finetune Controller is a robust and flexible system designed to manage and streamline the fine-tuning of machine learning models on Kubernetes, particularly within OpenShift clusters. This project leverages modern tools and workflows, enabling efficient development and deployment processes for AI-driven applications.
- Local Development: Get started quickly with a streamlined setup process using uv, a high-performance Python package and project manager.
- OpenShift Integration: Simplify deployment and scaling with OpenShift-specific configurations and GPU support for intensive workloads.
- MongoDB Backend: Seamlessly connect to a local or cluster-based MongoDB database.
- Extensibility: Easily integrate with the Kubeflow Training Operator and other components for advanced workflows.
If the cluster is already set up continue else follow the cluster setup instructions here
Recommend using uv, an extremely fast Python package and project manager
pip install uv
A container engine such as Docker or Podman
Create virtual environment and install dependencies
uv sync
Start a local developement mongo database (or connect to one on cluster with port-forward)
docker run -d --rm --name mongodb \ -e MONGODB_INITDB_ROOT_USERNAME="default-user" \ -e MONGODB_INITDB_ROOT_PASSWORD="admin123456789" \ -e MONGODB_INITDB_DATABASE="finetune" \ -p 27017:27017 \ mongodb/mongodb-community-server:latest
you can port-forward this connection to your local machine
oc port-forward service/mongodb-community-server 27017:27017 -n <namespace>
Connect to the Openshift cluster with the cli login command
oc login
. If cluster not already set up follow these steps -
Create a project level
file (see.env.example
) and update the variables.cp .env.example .env
Make sure the virtual environment is activated and start the local finetuning controller application.
source .venv/bin/activate uvicorn app.main:app --reload
This will:
- Start MongoDB with the required configuration
- Build and start the FastAPI server
- Make the application available at http://localhost:8000
Setup pre-commit to keep linting and code styling up to standard.
uv sync
pre-commit install
Name can be descriptive for these examples we will use finetune-controller
oc new-project finetune-controller
oc new-project kubeflow
kubectl apply --server-side -k ""
Requires Kubernetes 1.29 or newer
Follow the latest docs
Install a released version
kubectl apply --server-side -f
To wait for Kueue to be fully available, run:
kubectl wait deploy/kueue-controller-manager -nkueue-system --for=condition=available --timeout=5m
Restart pods
kubectl delete pods -lcontrol-plane=controller-manager -nkueue-system
First update the namepspace for the crd LocalQueue object in default-user-queue.yaml. default namepsace: "default"
yq e '.metadata.namespace = "finetune-controller"' -i crds/kueue/default-user-queue.yaml
Apply the default CRD config for Kueue or update by following their docs
kubectl apply -f crds/kueue/
Example configuration. do properly configure for production
oc new-app -e MONGODB_INITDB_ROOT_USERNAME="default-user" -e MONGODB_INITDB_ROOT_PASSWORD="admin123456789" -e MONGODB_INITDB_DATABASE="finetune" mongodb/mongodb-community-server:latest --namespace finetune-controller
Go to your cluster on redhat console admin dashboard. Add a machine pool of your choosing with the following configuration:
value: <machine pool type or other>
effect: NoSchedule
Node Labels
Key: cluster-api/accelerator
Value: <gpu type e.g. V100 or empty>
Example aws config
# aws_credentials.yaml
apiVersion: v1
AWS_ACCESS_KEY_ID: |base64 encoded secret
AWS_SECRET_ACCESS_KEY: |base64 encoded secret
AWS_REGION: |base64 encoded string
kind: Secret
name: aws-credentials
type: Opaque
Example docker pull secret config
# pull_secret.yaml
apiVersion: v1
.dockerconfigjson: ...
kind: Secret
name: cr-pull-secret
Apply these secrets
oc apply -f aws-credentials.yaml -n finetune-controller
Create a
file and update the defaults. For this example setMONGODB_URL=mongodb://mongodb-community-server.finetune-controller.svc.cluster.local:27017
cp .env.example .env.production
create the application
oc new-app --strategy=docker --binary --name finetune-controller --env-file=".env.production" --namespace finetune-controller
expose services and patch tls config
oc expose deployment/finetune-controller --port=8000 oc expose svc/finetune-controller --port=8000 oc patch route finetune-controller --type=merge -p '{"spec":{"tls":{"termination":"edge"}}}'
add cluster role binding permissions to the application
start a build
oc start-build finetune-controller --from-dir=. --namespace=finetune-controller
Publish From current project
Publish From git ~HEAD