Skip to content

Commit c4ae2b4

Browse files
authored
Merge pull request #2241 from pareenaverma/content_review
Tech review of arcee on gcp
2 parents 0f8ca2d + aa7f7d6 commit c4ae2b4

File tree

3 files changed

+7
-4
lines changed

3 files changed

+7
-4
lines changed

content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-gcp/01_launching_an_axion_instance.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ layout: learningpathall
1111
Before you begin, make sure you have the following:
1212

1313
- A Google Cloud account
14-
- Permission to launch a Compute Engine Axion instance of type `c4a-standard-16` (or larger)
14+
- Permission to launch a Google Axion instance of type `c4a-standard-16` (or larger)
1515
- At least 128 GB of available storage
1616

1717
If you're new to Google Cloud, check out the Learning Path [Getting Started with Google Cloud](/learning-paths/servers-and-cloud-computing/csp/google/).
@@ -33,6 +33,8 @@ In the left sidebar, select **OS and storage**.
3333

3434
Under **Operating system and storage**, click on **Change**
3535

36+
Select Ubuntu as the Operating system. For version select Ubuntu 24.04 LTS Minimal.
37+
3638
Set the size of the disk to 128 GB, then click on **Select**.
3739

3840
## Review and launch the instance

content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-gcp/05_downloading_and_optimizing_afm45b.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -86,7 +86,7 @@ This command creates a 4-bit quantized version of the model:
8686
- `llama-quantize` is the quantization tool from Llama.cpp.
8787
- `afm-4-5B-F16.gguf` is the input GGUF model file in 16-bit precision.
8888
- `Q4_0` applies zero-point 4-bit quantization.
89-
- This reduces the model size by approximately 45% (from ~15GB to ~8GB).
89+
- This reduces the model size by approximately 70% (from ~15GB to ~4.4GB).
9090
- The quantized model will use less memory and run faster, though with a small reduction in accuracy.
9191
- The output file will be `afm-4-5B-Q4_0.gguf`.
9292

@@ -104,7 +104,7 @@ bin/llama-quantize models/afm-4-5b/afm-4-5B-F16.gguf models/afm-4-5b/afm-4-5B-Q8
104104

105105
This command creates an 8-bit quantized version of the model:
106106
- `Q8_0` specifies 8-bit quantization with zero-point compression.
107-
- This reduces the model size by approximately 70% (from ~15GB to ~4.4GB).
107+
- This reduces the model size by approximately 45% (from ~15GB to ~8GB).
108108
- The 8-bit version provides a better balance between memory usage and accuracy than 4-bit quantization.
109109
- The output file is named `afm-4-5B-Q8_0.gguf`.
110110
- Commonly used in production scenarios where memory resources are available.

content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-gcp/_index.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ learning_objectives:
1717
- Evaluate model quality by measuring perplexity
1818

1919
prerequisites:
20-
- A [Google Cloud account](https://console.cloud.google.com/) with permission to launch Axion (`c4a.4x-standard-16` or larger) instances
20+
- A [Google Cloud account](https://console.cloud.google.com/) with permission to launch Axion (`c4a-standard-16` or larger) instances
2121
- Basic familiarity with Linux and SSH
2222

2323
author: Julien Simon
@@ -28,6 +28,7 @@ skilllevels: Introductory
2828
subjects: ML
2929
arm_ips:
3030
- Neoverse
31+
cloud_service_providers: Google Cloud
3132
tools_software_languages:
3233
- Google Cloud
3334
- Hugging Face

0 commit comments

Comments
 (0)