Skip to content

Commit f536ffe

Browse files
PR: Add Versioning and Data Persistence and Software Versioning Docs
- Add specific anchored versions to all helm installs. - Add persistent storage volumes to MLFlow, Grafana, and Prometheus - Add initial software versioning documentation and templating - Some dead code cleanup. - Added vllm dashboard to grafana
1 parent d0520c1 commit f536ffe

18 files changed

+1886
-139
lines changed

RELEASE_NOTES.md

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
# Release Notes
2+
3+
The following document contains release notes. Each section will detail added features, what has changed, and what has been fixed. Release notes for the previous 5 releases will be maintained in this document. Click the dropdown next to a release to see its associated notes.
4+
5+
TODO (This file is intended to serve as a template for now):
6+
<details>
7+
<summary><strong>1.0.0</strong></summary>
8+
9+
### Added Features
10+
- Multinode inference
11+
- description one
12+
- description two
13+
- Blueprints can utilize RDMA connectivity between nodes
14+
- my description one
15+
- my description two
16+
17+
### Changed
18+
- Kuberay replaced by LeaderWorkerSet
19+
- MLFlow, Prometheus, and Grafana now use persistent volume claims instead of local storage
20+
- Anchored all versions of helm installs to specific versions which can be found [here](./docs/software_versions/QuickStartVersions.md#helm-chart-versions).
21+
22+
### Fixed
23+
- Fixed an issue with mlflow deployments where all mlflow experiments would fail because "Experiment 1" did not exist - bug in mlflow and using :memory: as the runs database.
24+
</details>
25+

docs/sample_blueprints/bucket_checkpoint_bucket_model_open_dataset.backend.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@
1313
"recipe_container_env": [
1414
{
1515
"key": "Mlflow_Endpoint",
16-
"value": "http://mlflow.default.svc.cluster.local:5000"
16+
"value": "http://mlflow.cluster-tools.svc.cluster.local:5000"
1717
},
1818
{
1919
"key": "Mlflow_Exp_Name",

docs/sample_blueprints/bucket_model_open_dataset.backend.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@
1313
"recipe_container_env": [
1414
{
1515
"key": "Mlflow_Endpoint",
16-
"value": "http://mlflow.default.svc.cluster.local:5000"
16+
"value": "http://mlflow.cluster-tools.svc.cluster.local:5000"
1717
},
1818
{
1919
"key": "Mlflow_Exp_Name",

docs/sample_blueprints/bucket_par_open_dataset.backend.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@
1313
"recipe_container_env": [
1414
{
1515
"key": "Mlflow_Endpoint",
16-
"value": "http://mlflow.default.svc.cluster.local:5000"
16+
"value": "http://mlflow.cluster-tools.svc.cluster.local:5000"
1717
},
1818
{
1919
"key": "Mlflow_Exp_Name",

docs/sample_blueprints/closed_model_open_dataset_hf.backend.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@
1313
"recipe_container_env": [
1414
{
1515
"key": "Mlflow_Endpoint",
16-
"value": "http://mlflow.default.svc.cluster.local:5000"
16+
"value": "http://mlflow.cluster-tools.svc.cluster.local:5000"
1717
},
1818
{
1919
"key": "Mlflow_Exp_Name",

docs/sample_blueprints/open_model_open_dataset_hf.backend.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@
1313
"recipe_container_env": [
1414
{
1515
"key": "Mlflow_Endpoint",
16-
"value": "http://mlflow.default.svc.cluster.local:5000"
16+
"value": "http://mlflow.cluster-tools.svc.cluster.local:5000"
1717
},
1818
{
1919
"key": "Mlflow_Exp_Name",
Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
# Blueprints Control Plane Software Versions
2+
3+
The following table describes software versions for tagged releases of blueprints control plane software, with most recent tags listed first.
4+
5+
<details>
6+
<summary><strong>latest</strong></summary>
7+
8+
## Software Used in Containers
9+
|Container Name|Provider|Name|Type|Version|
10+
|:------------:|:------:|:--:|:--:|:-----:|
11+
|oci-corrino-cp / pod-util-amd64|Oracle|oraclelinux|Container|8|
12+
|oci-corrino-cp / pod-util-amd64|Python|python311|Programming Language|3.11.11|
13+
|oci-corrino-cp / pod-util-amd64|Python Pip|python3.11-pip|Package Manager|22.3.1|
14+
|pod-util-amd64|Oracle|oci-cli|Application|3.12|
15+
16+
--------
17+
--------
18+
## Python Packages
19+
|Package Name|Version|
20+
|:----------:|:-----:|
21+
|Django|5.1.3|
22+
|django-extensions|3.2.3|
23+
|djangorestframework|3.14.0|
24+
|gunicorn|22.0.0|
25+
|jsonschema|4.23.0|
26+
|kubernetes|30.1.0
27+
|packaging|24.0|
28+
|psycopg2-binary|2.9.10|
29+
|pytz|2024.1|
30+
|sqlparse|0.5.0|
31+
|oci|2.138.1|
32+
|asgiref|3.8.1|
33+
|oracledb|2.5.0|
34+
|prometheus_client|0.21.1|
35+
</details>
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# Portal Versions
2+
3+
TODO
Lines changed: 120 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,120 @@
1+
# OCI AI Blueprints Quickstart Software Versions
2+
3+
The following table describes software versions for tagged releases of this quickstart software repository, with most recent tags listed first.
4+
5+
This will be replaced as soon as we start tagging. Wanted framework in place.
6+
<details>
7+
<summary><strong>release-2025-04-22</strong></summary>
8+
9+
## Cluster Creation Terraform
10+
### Terraform / Provider Versions
11+
|Component Type|Component Name|Component Source|Component Version|
12+
|:------------:|:------------:|:--------------:|:---------------:|
13+
|Language|Terraform|hashicorp|>=1.5|
14+
|Provider|oci|oracle/oci|>=5|
15+
|Provider|kubernetes|hashicorp/kubernetes|>=2.27|
16+
|Provider|helm|hashicorp/helm|>=2.12|
17+
|Provider|tls|hashicorp/tls|>=4|
18+
|Provider|local|hashicorp/local|>=2.5|
19+
|Provider|random|hashicorp/random|>=3.6|
20+
21+
### Oracle Services
22+
|Service|Version|
23+
|Oracle Kubernetes Engine|v1.31.1|
24+
25+
--------------
26+
--------------
27+
28+
## OCI AI Blueprints Terraform
29+
### Terraform / Provider Versions
30+
|Component Type|Component Name|Component Source|Component Version|
31+
|:------------:|:------------:|:--------------:|:---------------:|
32+
|Language|Terraform|hashicorp|>=1.1|
33+
|Provider|oci|oracle/oci| 4 <= version < 5|
34+
|Provider|kubernetes|hashicorp/kubernetes|>=2|
35+
|Provider|helm|hashicorp/helm|>=2|
36+
|Provider|tls|hashicorp/tls|>=4|
37+
|Provider|local|hashicorp/local|>=2|
38+
|Provider|random|hashicorp/random|>=3|
39+
40+
### Helm Chart Versions
41+
|Chart Name|Version|Chart URL|
42+
|:--------:|:-----:|:-------:|
43+
|Grafana|6.47.1|https://grafana.github.io/helm-charts|
44+
|Prometheus|19.0.1|https://prometheus-community.github.io/helm-charts|
45+
|Metrics Server|3.8.3|https://kubernetes-sigs.github.io/metrics-server|
46+
|Ingress Nginx|4.4.0|https://kubernetes.github.io/ingress-nginx|
47+
|MLFlow|0.16.5|https://community-charts.github.io/helm-charts|
48+
|NVIDIA GPU Operator|v25.3.0|https://helm.ngc.nvidia.com/nvidia|
49+
|Keda|2.17.0|https://kedacore.github.io/charts|
50+
|LeaderWorkerSet|0.1.0|local|
51+
52+
### Container Versions
53+
|Container|Version|Repository|
54+
|:--------|:------|:---------|
55+
|oci-corrino-cp|latest|iad.ocir.io/iduyx1qnmway/corrino-devops-repository|
56+
|oci-ai-blueprints-portal|latest|iad.ocir.io/iduyx1qnmway/corrino-devops-repository|
57+
58+
### Oracle Services
59+
|Service|Version|
60+
|Oracle Autonomous Database|19c|
61+
62+
</details>
63+
64+
<details>
65+
<summary><strong>release-2025-04-01</strong></summary>
66+
67+
## Cluster Creation Terraform
68+
### Terraform / Provider Versions
69+
|Component Type|Component Name|Component Source|Component Version|
70+
|:------------:|:------------:|:--------------:|:---------------:|
71+
|Language|Terraform|hashicorp|>=1.5|
72+
|Provider|oci|oracle/oci|>=5|
73+
|Provider|kubernetes|hashicorp/kubernetes|>=2.27|
74+
|Provider|helm|hashicorp/helm|>=2.12|
75+
|Provider|tls|hashicorp/tls|>=4|
76+
|Provider|local|hashicorp/local|>=2.5|
77+
|Provider|random|hashicorp/random|>=3.6|
78+
79+
### Oracle Services
80+
|Service|Version|
81+
|Oracle Kubernetes Engine|v1.31.1|
82+
83+
--------------
84+
--------------
85+
86+
## OCI AI Blueprints Terraform
87+
### Terraform / Provider Versions
88+
|Component Type|Component Name|Component Source|Component Version|
89+
|:------------:|:------------:|:--------------:|:---------------:|
90+
|Language|Terraform|hashicorp|>=1.1|
91+
|Provider|oci|oracle/oci| 4 <= version < 5|
92+
|Provider|kubernetes|hashicorp/kubernetes|>=2|
93+
|Provider|helm|hashicorp/helm|>=2|
94+
|Provider|tls|hashicorp/tls|>=4|
95+
|Provider|local|hashicorp/local|>=2|
96+
|Provider|random|hashicorp/random|>=3|
97+
98+
### Helm Chart Versions
99+
|Chart Name|Version|Chart URL|
100+
|:--------:|:-----:|:-------:|
101+
|Grafana|6.47.1|https://grafana.github.io/helm-charts|
102+
|Prometheus|19.0.1|https://prometheus-community.github.io/helm-charts|
103+
|Metrics Server|3.8.3|https://kubernetes-sigs.github.io/metrics-server|
104+
|Ingress Nginx|4.4.0|https://kubernetes.github.io/ingress-nginx|
105+
|MLFlow|0.16.5|https://community-charts.github.io/helm-charts|
106+
|NVIDIA GPU Operator|v25.3.0|https://helm.ngc.nvidia.com/nvidia|
107+
|Keda|2.17.0|https://kedacore.github.io/charts|
108+
109+
### Container Versions
110+
|Container|Version|Repository|
111+
|:--------|:------|:---------|
112+
|oci-corrino-cp|latest|iad.ocir.io/iduyx1qnmway/corrino-devops-repository|
113+
|oci-ai-blueprints-portal|latest|iad.ocir.io/iduyx1qnmway/corrino-devops-repository|
114+
115+
### Oracle Services
116+
|Service|Version|
117+
|Oracle Autonomous Database|19c|
118+
119+
</details>
120+

docs/software_versions/README.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
# Software Versions
2+
3+
Each link provides software versions for tools utilized in the various components of the software managed by Blueprints:
4+
5+
- [OCI AI Blueprints Quickstart Software Versions](./QuickStartVersions.md)
6+
- [Blueprints Control Plane Software Versions](./ControlPlaneVersions.md)
7+
- [Blueprints Portal Software Versions](./PortalVersions.md)
8+

oci_ai_blueprints_terraform/app-portal.tf

Lines changed: 0 additions & 73 deletions
This file was deleted.

0 commit comments

Comments
 (0)