|
| 1 | +--- |
| 2 | + date: 2025-06-03 |
| 3 | + title: Deploying the RAG-LLM GitOps Pattern on Azure |
| 4 | + summary: How to deploy the RAG-LLM GitOps Validated Pattern on an ARO cluster |
| 5 | + author: Drew Minnear |
| 6 | + blog_tags: |
| 7 | + - patterns |
| 8 | + - how-to |
| 9 | + - Azure |
| 10 | + - Azure SQL Server |
| 11 | + - ARO |
| 12 | + - rag-llm-gitops |
| 13 | +--- |
| 14 | +:toc: |
| 15 | +:imagesdir: /images |
| 16 | + |
| 17 | +[IMPORTANT] |
| 18 | +==== |
| 19 | +Currently, the Azure SQL (MSSQL) support and related Azure deployment improvements are available only in the branch https://github.com/dminnear-rh/rag-llm-gitops/tree/use-mssql-db[`use-mssql-db`] of my fork. |
| 20 | +
|
| 21 | +You must fork or base your deployment off this branch until the changes in https://github.com/validatedpatterns/rag-llm-gitops/pull/66[PR #66] are merged into the main pattern repository. |
| 22 | +==== |
| 23 | + |
| 24 | +== Prerequisites |
| 25 | + |
| 26 | +Before you start, ensure the following: |
| 27 | + |
| 28 | +* You are logged into an existing ARO cluster. |
| 29 | +* Your Azure subscription has sufficient quota for GPU instances (default: `Standard_NC8as_T4_v3`, requiring at least 8 CPUs). |
| 30 | +* You've created a token on https://huggingface.co[HuggingFace] and accepted the terms of the model you'll deploy. By default, the pattern uses the https://huggingface.co/solidrust/Mistral-7B-Instruct-v0.3-AWQ[Mistral-7B-Instruct-v0.3-AWQ] model. |
| 31 | + |
| 32 | +TIP: Model and database defaults are defined in `overrides/values-Azure.yaml`. You can override them by editing this file. |
| 33 | + |
| 34 | +== Database Options |
| 35 | + |
| 36 | +The pattern defaults to using Azure SQL Server. Alternatively, you may deploy a local Redis, PostgreSQL, or Elasticsearch instance within your cluster. |
| 37 | + |
| 38 | +To select your database type, edit `overrides/values-Azure.yaml`: |
| 39 | + |
| 40 | +[source,yaml] |
| 41 | +---- |
| 42 | +global: |
| 43 | + db: |
| 44 | + type: "AZURESQL" # Options: AZURESQL, REDIS, EDB, ELASTIC |
| 45 | +---- |
| 46 | + |
| 47 | +WARNING: Choosing Redis, PostgreSQL (EDB), or Elasticsearch (ELASTIC) will deploy local database instances. Ensure your cluster has sufficient resources available. |
| 48 | + |
| 49 | +== Deploying Azure SQL Server (Optional) |
| 50 | + |
| 51 | +Follow these steps if you plan to use Azure SQL Server: |
| 52 | + |
| 53 | +. Navigate to the Azure portal and create a new SQL Database server. |
| 54 | +. Select `Use SQL authentication`. |
| 55 | +. Record your `Server name`, `Server admin login`, and `Password` (these will be needed later). |
| 56 | +. On the *Networking* tab, set `Allow Azure services and resources to access this server` to `Yes`. |
| 57 | +. Click *Review + create*, and then *Create*. |
| 58 | + |
| 59 | +Wait until the server status shows as active before proceeding. |
| 60 | + |
| 61 | +== Creating Required Secrets |
| 62 | + |
| 63 | +Before installation, create a secrets YAML file at `~/values-secret-rag-llm-gitops.yaml`. Populate it as follows: |
| 64 | + |
| 65 | +[source,yaml] |
| 66 | +---- |
| 67 | +version: "2.0" |
| 68 | +
|
| 69 | +secrets: |
| 70 | + - name: hfmodel |
| 71 | + fields: |
| 72 | + - name: hftoken |
| 73 | + value: hf_your_huggingface_token |
| 74 | + - name: azuresql |
| 75 | + fields: |
| 76 | + - name: user |
| 77 | + value: adminuser |
| 78 | + - name: password |
| 79 | + value: your_password |
| 80 | + - name: server |
| 81 | + value: yourservername.database.windows.net |
| 82 | +---- |
| 83 | + |
| 84 | +Replace these placeholders with your actual credentials: |
| 85 | + |
| 86 | +* `hftoken`: Your HuggingFace token (you must accept the model's terms). |
| 87 | +* `user`: Azure SQL server admin username. |
| 88 | +* `password`: Azure SQL admin password. |
| 89 | +* `server`: Fully qualified Azure SQL server name. |
| 90 | + |
| 91 | +TIP: If you're not using Azure SQL Server, omit the entire `azuresql` section. |
| 92 | + |
| 93 | +== Creating GPU Nodes (MachineSet) |
| 94 | + |
| 95 | +Your cluster requires GPU nodes with a specific taint to host the vLLM inference service: |
| 96 | + |
| 97 | +[source,yaml] |
| 98 | +---- |
| 99 | +- key: odh-notebook |
| 100 | + value: "true" |
| 101 | + effect: NoSchedule |
| 102 | +---- |
| 103 | + |
| 104 | +=== Creating GPU Nodes Automatically |
| 105 | + |
| 106 | +If no GPU nodes exist, run this command to provision one default GPU node: |
| 107 | + |
| 108 | +[source,shell] |
| 109 | +---- |
| 110 | +./pattern.sh make create-gpu-machineset-azure |
| 111 | +---- |
| 112 | + |
| 113 | +This creates a single `Standard_NC8as_T4_v3` GPU node. |
| 114 | + |
| 115 | +=== Customizing GPU Node Creation |
| 116 | + |
| 117 | +To control GPU node specifics, provide additional parameters: |
| 118 | + |
| 119 | +[source,shell] |
| 120 | +---- |
| 121 | +./pattern.sh make create-gpu-machineset-azure GPU_REPLICAS=3 OVERRIDE_ZONE=2 GPU_VM_SIZE=Standard_NC16as_T4_v3 |
| 122 | +---- |
| 123 | + |
| 124 | +Parameters available: |
| 125 | + |
| 126 | +* `GPU_REPLICAS`: Number of GPU nodes to provision. |
| 127 | +* `OVERRIDE_ZONE`: Availability zone (optional). |
| 128 | +* `GPU_VM_SIZE`: Azure VM SKU for GPU nodes. |
| 129 | + |
| 130 | +The script automatically applies the required taint. The Nvidia Operator installed by the pattern will handle CUDA driver installation on GPU nodes. |
| 131 | + |
| 132 | +== Installing the Pattern |
| 133 | + |
| 134 | +Ensure you've completed the following steps: |
| 135 | + |
| 136 | +. Logged into your ARO cluster. |
| 137 | +. Created your database (Azure SQL Server) if applicable. |
| 138 | +. Prepared the secrets YAML file (`~/values-secret-rag-llm-gitops.yaml`). |
| 139 | +. Provisioned GPU nodes with the required taint. |
| 140 | + |
| 141 | +Finally, install the pattern by running: |
| 142 | + |
| 143 | +[source,shell] |
| 144 | +---- |
| 145 | +./pattern.sh make install |
| 146 | +---- |
| 147 | + |
| 148 | +Your RAG-LLM GitOps Validated Pattern will now deploy to your Azure Red Hat OpenShift cluster. |
0 commit comments