Skip to content

Commit a371a5e

Browse files
authored
Update README.md (#87)
* Update README.md * Update README.md
1 parent 6f8dd4a commit a371a5e

File tree

1 file changed

+3
-3
lines changed

1 file changed

+3
-3
lines changed

README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -104,10 +104,10 @@ python run_interactive.py --model_name=$model_name --size=7b --batch_size=64 --m
104104

105105

106106
# Run the server
107-
NOTE: the `--platform=tpu=8` need to specify number of tpu devices (which is 4 for v4-8 and 8 for v5light-8`)
107+
Here is an example to run the server with llama2 7B config. Note that the `--platform=tpu=8` need to specify number of tpu devices (which is 4 for v4-8 and 8 for v5light-8`).
108108

109109
```bash
110-
python run_server.py --param_size=7b --model_name=$model_name --batch_size=128 --max_cache_length=2048 --quantize_weights=$quantize --quantize_kv_cache=$quantize --checkpoint_path=$output_ckpt_dir --tokenizer_path=$tokenizer_path --platform=tpu=8 --model=$model_name
110+
python run_server.py --param_size=7b --batch_size=128 --max_cache_length=2048 --quantize_weights=$quantize --quantize_kv_cache=$quantize --checkpoint_path=$output_ckpt_dir --tokenizer_path=$tokenizer_path --platform=tpu=8 --model=$model_name --sharding_config="default_shardings/llama.yaml"
111111
```
112112

113113
Now you can fire gRPC to it.
@@ -122,7 +122,7 @@ Optional flags:
122122
the ones in default_shardings directory.
123123

124124
# Run benchmark
125-
go to the deps/JetStream folder (downloaded during `install_everything.sh`)
125+
Start the server and then go to the deps/JetStream folder (downloaded during `install_everything.sh`)
126126

127127
```bash
128128
cd deps/JetStream

0 commit comments

Comments
 (0)