File tree 2 files changed +8
-6
lines changed
2 files changed +8
-6
lines changed Original file line number Diff line number Diff line change @@ -4,7 +4,8 @@ This example uses SparseML and Compressed-Tensors to create a 2:4 sparse and qua
4
4
The model is calibrated and trained with the ultachat200k dataset.
5
5
At least 75GB of GPU memory is required to run this example.
6
6
7
- Follow the steps below, or to run the example as ` python examples/llama7b_sparse_quantized/llama7b_sparse_w4a16.py `
7
+ Follow the steps below one by one in a code notebook, or run the full example script
8
+ as ` python examples/llama7b_sparse_quantized/llama7b_sparse_w4a16.py `
8
9
9
10
## Step 1: Select a model, dataset, and recipe
10
11
In this step, we select which model to use as a baseline for sparsification, a dataset to
@@ -36,7 +37,8 @@ recipe = "2:4_w4a16_recipe.yaml"
36
37
37
38
## Step 2: Run sparsification using ` apply `
38
39
The ` apply ` function applies the given recipe to our model and dataset.
39
- The hardcoded kwargs may be altered based on each model's needs.
40
+ The hardcoded kwargs may be altered based on each model's needs. This code snippet should
41
+ be run in the same Python instance as step 1.
40
42
After running, the sparsified model will be saved to ` output_llama7b_2:4_w4a16_channel ` .
41
43
42
44
``` python
67
69
### Step 3: Compression
68
70
69
71
The resulting model will be uncompressed. To save a final compressed copy of the model
70
- run the following:
72
+ run the following in the same Python instance as the previous steps.
71
73
72
74
``` python
73
75
import torch
Original file line number Diff line number Diff line change 16
16
num_bits: 8
17
17
type: "int"
18
18
symmetric: true
19
- strategy: "channel "
19
+ strategy: "tensor "
20
20
input_activations:
21
21
num_bits: 8
22
22
type: "int"
23
23
symmetric: true
24
- dynamic: True
24
+ dynamic: true
25
25
strategy: "token"
26
26
targets: ["Linear"]
27
27
"""
37
37
dataset = "ultrachat-200k"
38
38
39
39
# save location of quantized model out
40
- output_dir = "./output_llama7b_w8a8_channel_dynamic_compressed "
40
+ output_dir = "./output_llama7b_w8a8_dynamic_compressed "
41
41
42
42
# set dataset config parameters
43
43
splits = {"calibration" : "train_gen[:5%]" }
You can’t perform that action at this time.
0 commit comments