Skip to content

Commit 22cc6be

Browse files
author
Sara Adkins
authored
Sparse Quantization Example Clarification (#2334)
* clarify example * cleanup * update examples * update output name
1 parent ffa3852 commit 22cc6be

File tree

2 files changed

+8
-6
lines changed

2 files changed

+8
-6
lines changed

examples/llama7b_sparse_quantized/README.md

+5-3
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,8 @@ This example uses SparseML and Compressed-Tensors to create a 2:4 sparse and qua
44
The model is calibrated and trained with the ultachat200k dataset.
55
At least 75GB of GPU memory is required to run this example.
66

7-
Follow the steps below, or to run the example as `python examples/llama7b_sparse_quantized/llama7b_sparse_w4a16.py`
7+
Follow the steps below one by one in a code notebook, or run the full example script
8+
as `python examples/llama7b_sparse_quantized/llama7b_sparse_w4a16.py`
89

910
## Step 1: Select a model, dataset, and recipe
1011
In this step, we select which model to use as a baseline for sparsification, a dataset to
@@ -36,7 +37,8 @@ recipe = "2:4_w4a16_recipe.yaml"
3637

3738
## Step 2: Run sparsification using `apply`
3839
The `apply` function applies the given recipe to our model and dataset.
39-
The hardcoded kwargs may be altered based on each model's needs.
40+
The hardcoded kwargs may be altered based on each model's needs. This code snippet should
41+
be run in the same Python instance as step 1.
4042
After running, the sparsified model will be saved to `output_llama7b_2:4_w4a16_channel`.
4143

4244
```python
@@ -67,7 +69,7 @@ apply(
6769
### Step 3: Compression
6870

6971
The resulting model will be uncompressed. To save a final compressed copy of the model
70-
run the following:
72+
run the following in the same Python instance as the previous steps.
7173

7274
```python
7375
import torch

examples/llama7b_w8a8_quantization.py

+3-3
Original file line numberDiff line numberDiff line change
@@ -16,12 +16,12 @@
1616
num_bits: 8
1717
type: "int"
1818
symmetric: true
19-
strategy: "channel"
19+
strategy: "tensor"
2020
input_activations:
2121
num_bits: 8
2222
type: "int"
2323
symmetric: true
24-
dynamic: True
24+
dynamic: true
2525
strategy: "token"
2626
targets: ["Linear"]
2727
"""
@@ -37,7 +37,7 @@
3737
dataset = "ultrachat-200k"
3838

3939
# save location of quantized model out
40-
output_dir = "./output_llama7b_w8a8_channel_dynamic_compressed"
40+
output_dir = "./output_llama7b_w8a8_dynamic_compressed"
4141

4242
# set dataset config parameters
4343
splits = {"calibration": "train_gen[:5%]"}

0 commit comments

Comments
 (0)