Skip to content

Commit 49114d9

Browse files
authored
Merge pull request #4 from nod-ai/saienduri/dind-sidecar
Move dind to sidecar and restructure repo.
2 parents 0dd9990 + 76b0ef0 commit 49114d9

File tree

5 files changed

+39
-228
lines changed

5 files changed

+39
-228
lines changed

README.md

Lines changed: 5 additions & 135 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
1-
# Azure-AKS-ARC-Setup
1+
# ARC Setup
22

3-
Documentation for bringing up an Azure Kubernetes cluster integrated with GitHub Actions Runner Controller for IREE Project
3+
Documentation for bringing up a Kubernetes cluster integrated with GitHub Actions Runner Controller.
44

5-
### Step 1: Create Azure Kubernetes Service
5+
### Step 1: Create Azure Kubernetes Service (skip if kubernetes already setup on bare metal or other CSP)
66

77
Search for Kubernetes Service in the top search bar in Azure Portal. Once in, now click on Create -> Kubernetes Cluster.
88
Choose your resource group and cluster name and proceed with default options for Basics.
@@ -19,7 +19,7 @@ I went with this VM because out of all the 48 core ones, it is the only one that
1919

2020
For the rest of the cluster creation options you can choose the default.
2121

22-
### Step 2: Login to your Cluster
22+
### Step 2: Login to your Cluster (skip if kubernetes already setup on bare metal or other CSP)
2323

2424
Now, to configure the cluster and all the services you need to connect to the cluster.
2525
You can do this in your own local dev environment (just make sure you have kube, helm, and azure cli installed)
@@ -44,7 +44,7 @@ helm install arc --namespace "arc" --create-namespace oci://ghcr.io/actions/acti
4444
### Step 4: Configure and Deploy Runner Scale Set
4545

4646
```
47-
helm upgrade --install "azure-linux-scale" --namespace "<namespace_name_for_runners>" --create-namespace --set githubConfigUrl="<link_to_your_github_repo_or_org>" --set githubConfigSecret.github_token="<your_PAT_token>" oci://ghcr.io/actions/actions-runner-controller-charts/gha-runner-scale-set -f values.yaml
47+
helm upgrade --install "azure-linux-scale" --namespace "<namespace_name_for_runners>" --create-namespace oci://ghcr.io/actions/actions-runner-controller-charts/gha-runner-scale-set -f config-file.yaml
4848
```
4949

5050
Please use the values.yaml file from `latest-config-files` folder in this repo for the above command.
@@ -58,133 +58,3 @@ The scaling setup is basically the same as the legacy documentation below, so pl
5858
Also, docker in docker is setup, so in our github workflows we can specify images to use if we want (iree uses cpubuilder_ubuntu_jammy image for example), but as done in iree-turbine, we can just run workflows using the preconfigured custom image here without further setup and that works too.
5959

6060
And you're done (just make sure label matches installation name in workflow) :)
61-
62-
# Legacy ARC Instructions (still works)
63-
64-
### Step 3: Install Cert Manager
65-
66-
```
67-
helm repo add jetstack https://charts.jetstack.io
68-
helm repo update
69-
helm install cert-manager jetstack/cert-manager --namespace cert-manager --create-namespace --version v1.15.3 --set crds.enabled=true
70-
```
71-
72-
Cert-Manager is a Kubernetes add-on that automates the management and issuance of TLS (Transport Layer Security) certificates.
73-
This is used for security reasons.
74-
75-
### Step 4: Install Github ARC and Authenticate
76-
77-
I do this using a personal token. So, if you don't have one, create a github token with these permissions:
78-
79-
```
80-
repo (all)
81-
admin:org (all) (mandatory for organization-wide runner)
82-
admin:enterprise (all) (mandatory for enterprise-wide runner)
83-
admin:public_key - read:public_key
84-
admin:repo_hook - read:repo_hook
85-
admin:org_hook
86-
notifications
87-
workflow
88-
```
89-
90-
We will also be adding a webhook server as part of installing the actions-runner-controller, so we need to create a secret for the server to authenticate the github webhooks coming in.
91-
92-
```
93-
kubectl create namespace actions-runner-system
94-
kubectl create secret generic github-selfhosted-webhook-token -n actions-runner-system --from-literal=SELFHOSTED_GITHUB_WEBHOOK_SECRET_TOKEN=<your_webhook_secret>
95-
```
96-
97-
Then, use the following command to install the github ARC
98-
99-
```
100-
helm repo add actions-runner-controller https://actions-runner-controller.github.io/actions-runner-controller
101-
helm repo update
102-
helm upgrade --install --namespace actions-runner-system --set=authSecret.create=true --set=authSecret.github_token="<your_token>" --wait actions-runner-controller actions-runner-controller/actions-runner-controller -f runner-controller.yaml
103-
```
104-
105-
The yaml file used above configures the actions runner controller service and the webhook server. I've added the yaml file I used (`runner-controller.yaml`) to this repo.
106-
Here we tell it to configure a bunch of things for the runner controller, and we give it a docker image to use.
107-
I've set it up to use `summerwind/actions-runner:ubuntu-22.04` which is the latest one provided by the github actions controller with dind enabled.
108-
This works fine for us and passes all iree-turbine jobs (with no docker) and the iree jobs (these use multiple docker images and work through dind)
109-
110-
### Step 5: Configure GitHub Webhooks
111-
112-
I've set this up to use webhooks to drive the overall scaling of our cluster.
113-
This scaling is performed based on the number of webhook events received from GitHub.
114-
Here's an image on how that overall process works:
115-
116-
![image](https://github.com/user-attachments/assets/b11266c5-0c80-4a34-aa18-19a4da255965)
117-
118-
119-
To configure this, first we need to expose the github-webhook server created above to the public, so it can receive from GitHub API.
120-
To do this, get the current configuration if the server using this command:
121-
`kubectl get svc actions-runner-controller-github-webhook-server -n actions-runner-system -o yaml > current-config.yaml`
122-
123-
Then, open up current-config.yaml and change spec type from `ClusterIP` to `LoadBalancer` in the yaml file and also delete the following lines which aren't neccesary after the switch.
124-
Also change `http` to `https` in the config.
125-
```
126-
clusterIP: 10.0.11.74
127-
clusterIPs:
128-
- 10.0.11.74
129-
internalTrafficPolicy: Cluster
130-
ipFamilies:
131-
- IPv4
132-
ipFamilyPolicy: SingleStack
133-
```
134-
TODO(saienduri): Find a way to just configure it with a load balancer initially (just webhook server, not the service)
135-
136-
Then, to actually update the service to use the updated config:
137-
```
138-
kubectl apply -f current-config.yaml
139-
```
140-
141-
Now that the server and webhook secret have been configured, you can go to the github org/repo to set up the github side of things.
142-
Go to "Settings" -> "Webhooks".
143-
Create a new webhook with address `http://<external-ip>/webhooks` and the content type as `application/json`.
144-
Then in the secret section add the secret that we added earlier.
145-
For events, you can pick "Let me select individual events" and then choose push, workflow, and workflow jobs.
146-
If you don't know the external IP of the webhook server you can run:
147-
`kubectl get svc -n actions-runner-system`
148-
149-
<img width="566" alt="image" src="https://github.com/user-attachments/assets/76e5d247-c5dd-4aef-aba1-374b789ce7f8">
150-
151-
152-
### Step 6: Deploy the Runners
153-
154-
Here, we deploy the runners.
155-
Specifically, we tell the actions runner controller how much resources we need (45 cores, 50 GB).
156-
We also give it a runner label that we use in the actual workflow `runs-on:` (I use azure-linux in the yaml)
157-
You can use the yaml in this repo (runner-deployment.yaml) in the following command:
158-
159-
`kubectl apply -f runner-deployment.yaml`
160-
161-
### Step 7: Configure HRA
162-
163-
This is to configure GitHub Actions Runner Controller's HorizontalRunnerAutoscaler (HRA).
164-
With the GitHub Actions Runner Controller in a Kubernetes cluster, each runner corresponds to a single container within a pod, and each pod only runs one runner.
165-
This particular design of the Actions Runner Controller makes sure that each runner operates in its own isolated environment, for the best security of concurrent CI jobs running.
166-
So, you can think of HRA as a specialized version of HPA, and we don't need it in the GitHub ARC context.
167-
Here, we tell HRA to scale the GitHub Actions runners based on the webhooks we configured earlier.
168-
Specifically, we trigger an autoscale everytime there is a webhook event for a workflow, so a runner will be requested.
169-
It will also downscale appropriately.
170-
You can use the yaml in this repo (horizontal-scale.yaml) for the following command:
171-
172-
`kubectl apply -f horizontal-scale.yaml`
173-
174-
Basically there are two levels of autoscaling.
175-
HRA adjusts the number of pods to meet the runner demand.
176-
If the number of pods increases beyond the capacity of the current nodes, the Cluster Autoscaler (the thing we setup at the very start) steps in to scale up the node pool, adding more nodes to provide the necessary resources for the additional pods.
177-
178-
179-
Now, change your workflows appropriately to match the labels set in the runner-deployment.yaml and enjoy the AKS + ARC magic :)
180-
181-
182-
183-
184-
185-
186-
187-
188-
189-
190-

latest-config-files/values.yaml renamed to config-files/iree-org/azure-linux-scale.yaml

Lines changed: 34 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,8 @@
1+
# Cluster: Azure SaiScale Kubernetes Cluster
2+
# Deployment command:
3+
# helm upgrade --install "azure-linux-scale" --namespace "arc-runners" --create-namespace oci://ghcr.io/actions/actions-runner-controller-charts/gha-runner-scale-set -f <path-to-this-file>
4+
githubConfigUrl: https://github.com/iree-org
5+
githubConfigSecret: "iree-secret"
16
## maxRunners is the max number of runners the auto scaling runner set will scale up to.
27
maxRunners: 30
38

@@ -16,11 +21,39 @@ template:
1621
volumeMounts:
1722
- name: dind-externals
1823
mountPath: /home/runner/tmpDir
24+
- name: dind
25+
image: ghcr.io/saienduri/dind:main
26+
restartPolicy: Always
27+
command: ["sh", "-c"]
28+
args:
29+
- |
30+
dockerd --host=unix:///var/run/docker.sock --group=${DOCKER_GROUP_GID} &
31+
until docker info >/dev/null 2>&1; do sleep 5; done
32+
tail -f /dev/null
33+
env:
34+
- name: DOCKER_GROUP_GID
35+
value: "123"
36+
securityContext:
37+
privileged: true
38+
volumeMounts:
39+
- name: work
40+
mountPath: /home/runner/_work
41+
- name: dind-sock
42+
mountPath: /var/run
43+
- name: dind-externals
44+
mountPath: /home/runner/externals
1945
containers:
2046
- name: runner
2147
image: ghcr.io/saienduri/ghascale:main
2248
imagePullPolicy: Always
23-
command: ["/home/runner/run.sh"]
49+
command:
50+
- /bin/sh
51+
- -c
52+
- |
53+
# Wait for Docker to be ready before starting runner
54+
echo "Waiting for docker..."
55+
until docker info >/dev/null 2>&1; do sleep 5; done
56+
/home/runner/run.sh
2457
resources:
2558
requests:
2659
cpu: 40000m
@@ -33,24 +66,6 @@ template:
3366
mountPath: /home/runner/_work
3467
- name: dind-sock
3568
mountPath: /var/run
36-
- name: dind
37-
image: docker:dind
38-
args:
39-
- dockerd
40-
- --host=unix:///var/run/docker.sock
41-
- --group=$(DOCKER_GROUP_GID)
42-
env:
43-
- name: DOCKER_GROUP_GID
44-
value: "123"
45-
securityContext:
46-
privileged: true
47-
volumeMounts:
48-
- name: work
49-
mountPath: /home/runner/_work
50-
- name: dind-sock
51-
mountPath: /var/run
52-
- name: dind-externals
53-
mountPath: /home/runner/externals
5469
volumes:
5570
- name: work
5671
emptyDir: {}

legacy-config-files/horizontal-scale.yaml

Lines changed: 0 additions & 15 deletions
This file was deleted.

legacy-config-files/runner-controller.yaml

Lines changed: 0 additions & 42 deletions
This file was deleted.

legacy-config-files/runner-deployment.yaml

Lines changed: 0 additions & 17 deletions
This file was deleted.

0 commit comments

Comments
 (0)