Skip to content

Commit 086cf90

Browse files
Jayon02Chayenneqiujiang chen
authored
Let lighteval support sglang (#552)
You can use sglang in lighteval tasks `lighteval sglang \ "pretrained=HuggingFaceH4/zephyr-7b-beta,dtype=float16" \ "helm|bigbench:bbq_lite_json:age_disambig|0|0"` --------- Co-authored-by: Chayenne <[email protected]> Co-authored-by: qiujiang chen <[email protected]>
1 parent bee02f7 commit 086cf90

File tree

11 files changed

+631
-11
lines changed

11 files changed

+631
-11
lines changed

docs/source/_toctree.yml

+2
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,8 @@
1919
title: Add a custom metric
2020
- local: use-vllm-as-backend
2121
title: Use VLLM as backend
22+
- local: use-sglang-as-backend
23+
title: Use SGLang as backend
2224
- local: evaluate-the-model-on-a-server-or-container
2325
title: Evaluate on Server
2426
- local: contributing-to-multilingual-evaluations

docs/source/installation.mdx

+3
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,8 @@ Lighteval has optional dependencies that you can install by specifying the
2323
appropriate extras group.
2424
`pip install lighteval[<group>]` or `pip install -e .[<group>]`.
2525

26+
If you want to use lighteval with `sglang`, try to follow [sglang install documentation](https://docs.sglang.ai/start/install.html).
27+
2628
| extra name | description |
2729
|--------------|---------------------------------------------------------------------------|
2830
| tgi | To use Text Generation Inference API to evaluate your model |
@@ -31,6 +33,7 @@ appropriate extras group.
3133
| adapters | To evaluate adapters models (delta and peft) |
3234
| tensorboardX | To upload your results to tensorboard |
3335
| vllm | To use vllm as backend for inference |
36+
| sglang | To use sglang as backend for inference |
3437
| s3 | To upload results to s3 |
3538

3639

docs/source/use-sglang-as-backend.mdx

+61
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
# Use SGLang as backend
2+
3+
Lighteval allows you to use `sglang` as backend allowing great speedups.
4+
To use, simply change the `model_args` to reflect the arguments you want to pass to sglang.
5+
6+
```bash
7+
lighteval sglang \
8+
"pretrained=HuggingFaceH4/zephyr-7b-beta,dtype=float16" \
9+
"leaderboard|truthfulqa:mc|0|0"
10+
```
11+
12+
`sglang` is able to distribute the model across multiple GPUs using data
13+
parallelism and tensor parallelism.
14+
You can choose the parallelism method by setting in the the `model_args`.
15+
16+
For example if you have 4 GPUs you can split it across using `tp_size`:
17+
18+
```bash
19+
lighteval sglang \
20+
"pretrained=HuggingFaceH4/zephyr-7b-beta,dtype=float16,tp_size=4" \
21+
"leaderboard|truthfulqa:mc|0|0"
22+
```
23+
24+
Or, if your model fits on a single GPU, you can use `dp_size` to speed up the evaluation:
25+
26+
```bash
27+
lighteval sglang \
28+
"pretrained=HuggingFaceH4/zephyr-7b-beta,dtype=float16,dp_size=4" \
29+
"leaderboard|truthfulqa:mc|0|0"
30+
```
31+
32+
## Use a config file
33+
34+
For more advanced configurations, you can use a config file for the model.
35+
An example of a config file is shown below and can be found at `examples/model_configs/sglang_model_config.yaml`.
36+
37+
```bash
38+
lighteval sglang \
39+
"examples/model_configs/sglang_model_config.yaml" \
40+
"leaderboard|truthfulqa:mc|0|0"
41+
```
42+
43+
```yaml
44+
model: # Model specific parameters
45+
base_params:
46+
model_args: "pretrained=HuggingFaceTB/SmolLM-1.7B,dtype=float16,chunked_prefill_size=4096,mem_fraction_static=0.9" # Model args that you would pass in the command line
47+
generation: # Generation specific parameters
48+
temperature: 0.3
49+
repetition_penalty: 1.0
50+
frequency_penalty: 0.0
51+
presence_penalty: 0.0
52+
top_k: -1
53+
min_p: 0.0
54+
top_p: 0.9
55+
max_new_tokens: 256
56+
stop_tokens: ["<EOS>", "<PAD>"]
57+
```
58+
59+
> [!WARNING]
60+
> In the case of OOM issues, you might need to reduce the context size of the
61+
> model as well as reduce the `mem_fraction_static` and `chunked_prefill_size` parameter.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
model:
2+
base_params:
3+
model_args: "pretrained=HuggingFaceTB/SmolLM-1.7B,dtype=float16,chunked_prefill_size=4096,mem_fraction_static=0.9"
4+
generation:
5+
temperature: 0.3
6+
repetition_penalty: 1.0
7+
frequency_penalty: 0.0
8+
presence_penalty: 0.0
9+
top_k: -1
10+
min_p: 0.0
11+
top_p: 0.9
12+
max_new_tokens: 256
13+
stop_tokens: ["<EOS>", "<PAD>"]

src/lighteval/__main__.py

+2
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,7 @@
2929
import lighteval.main_baseline
3030
import lighteval.main_endpoint
3131
import lighteval.main_nanotron
32+
import lighteval.main_sglang
3233
import lighteval.main_tasks
3334
import lighteval.main_vllm
3435

@@ -65,6 +66,7 @@
6566
app.command(rich_help_panel="Evaluation Utils")(lighteval.main_baseline.baseline)
6667
app.command(rich_help_panel="Evaluation Backends")(lighteval.main_nanotron.nanotron)
6768
app.command(rich_help_panel="Evaluation Backends")(lighteval.main_vllm.vllm)
69+
app.command(rich_help_panel="Evaluation Backends")(lighteval.main_sglang.sglang)
6870
app.add_typer(
6971
lighteval.main_endpoint.app,
7072
name="endpoint",

src/lighteval/main_sglang.py

+159
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,159 @@
1+
# MIT License
2+
3+
# Copyright (c) 2024 The SGLang Team
4+
5+
# Permission is hereby granted, free of charge, to any person obtaining a copy
6+
# of this software and associated documentation files (the "Software"), to deal
7+
# in the Software without restriction, including without limitation the rights
8+
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
# copies of the Software, and to permit persons to whom the Software is
10+
# furnished to do so, subject to the following conditions:
11+
12+
# The above copyright notice and this permission notice shall be included in all
13+
# copies or substantial portions of the Software.
14+
15+
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
# SOFTWARE.
22+
import os
23+
from typing import Optional
24+
25+
from typer import Argument, Option
26+
from typing_extensions import Annotated
27+
28+
29+
TOKEN = os.getenv("HF_TOKEN")
30+
CACHE_DIR: str = os.getenv("HF_HOME", "/scratch")
31+
32+
HELP_PANEL_NAME_1 = "Common Parameters"
33+
HELP_PANEL_NAME_2 = "Logging Parameters"
34+
HELP_PANEL_NAME_3 = "Debug Parameters"
35+
HELP_PANEL_NAME_4 = "Modeling Parameters"
36+
37+
38+
def sglang(
39+
# === general ===
40+
model_args: Annotated[
41+
str,
42+
Argument(
43+
help="Model arguments in the form key1=value1,key2=value2,... or path to yaml config file (see examples/model_configs/transformers_model.yaml)"
44+
),
45+
],
46+
tasks: Annotated[str, Argument(help="Comma-separated list of tasks to evaluate on.")],
47+
# === Common parameters ===
48+
use_chat_template: Annotated[
49+
bool, Option(help="Use chat template for evaluation.", rich_help_panel=HELP_PANEL_NAME_4)
50+
] = False,
51+
system_prompt: Annotated[
52+
Optional[str], Option(help="Use system prompt for evaluation.", rich_help_panel=HELP_PANEL_NAME_4)
53+
] = None,
54+
dataset_loading_processes: Annotated[
55+
int, Option(help="Number of processes to use for dataset loading.", rich_help_panel=HELP_PANEL_NAME_1)
56+
] = 1,
57+
custom_tasks: Annotated[
58+
Optional[str], Option(help="Path to custom tasks directory.", rich_help_panel=HELP_PANEL_NAME_1)
59+
] = None,
60+
cache_dir: Annotated[
61+
str, Option(help="Cache directory for datasets and models.", rich_help_panel=HELP_PANEL_NAME_1)
62+
] = CACHE_DIR,
63+
num_fewshot_seeds: Annotated[
64+
int, Option(help="Number of seeds to use for few-shot evaluation.", rich_help_panel=HELP_PANEL_NAME_1)
65+
] = 1,
66+
load_responses_from_details_date_id: Annotated[
67+
Optional[str], Option(help="Load responses from details directory.", rich_help_panel=HELP_PANEL_NAME_1)
68+
] = None,
69+
# === saving ===
70+
output_dir: Annotated[
71+
str, Option(help="Output directory for evaluation results.", rich_help_panel=HELP_PANEL_NAME_2)
72+
] = "results",
73+
push_to_hub: Annotated[
74+
bool, Option(help="Push results to the huggingface hub.", rich_help_panel=HELP_PANEL_NAME_2)
75+
] = False,
76+
push_to_tensorboard: Annotated[
77+
bool, Option(help="Push results to tensorboard.", rich_help_panel=HELP_PANEL_NAME_2)
78+
] = False,
79+
public_run: Annotated[
80+
bool, Option(help="Push results and details to a public repo.", rich_help_panel=HELP_PANEL_NAME_2)
81+
] = False,
82+
results_org: Annotated[
83+
Optional[str], Option(help="Organization to push results to.", rich_help_panel=HELP_PANEL_NAME_2)
84+
] = None,
85+
save_details: Annotated[
86+
bool, Option(help="Save detailed, sample per sample, results.", rich_help_panel=HELP_PANEL_NAME_2)
87+
] = False,
88+
# === debug ===
89+
max_samples: Annotated[
90+
Optional[int], Option(help="Maximum number of samples to evaluate on.", rich_help_panel=HELP_PANEL_NAME_3)
91+
] = None,
92+
job_id: Annotated[
93+
int, Option(help="Optional job id for future reference.", rich_help_panel=HELP_PANEL_NAME_3)
94+
] = 0,
95+
):
96+
"""
97+
Evaluate models using vllm as backend.
98+
"""
99+
import yaml
100+
101+
from lighteval.logging.evaluation_tracker import EvaluationTracker
102+
from lighteval.models.model_input import GenerationParameters
103+
from lighteval.models.sglang.sglang_model import SGLangModelConfig
104+
from lighteval.pipeline import EnvConfig, ParallelismManager, Pipeline, PipelineParameters
105+
106+
TOKEN = os.getenv("HF_TOKEN")
107+
108+
env_config = EnvConfig(token=TOKEN, cache_dir=cache_dir)
109+
110+
evaluation_tracker = EvaluationTracker(
111+
output_dir=output_dir,
112+
save_details=save_details,
113+
push_to_hub=push_to_hub,
114+
push_to_tensorboard=push_to_tensorboard,
115+
public=public_run,
116+
hub_results_org=results_org,
117+
)
118+
119+
pipeline_params = PipelineParameters(
120+
launcher_type=ParallelismManager.SGLANG,
121+
env_config=env_config,
122+
job_id=job_id,
123+
dataset_loading_processes=dataset_loading_processes,
124+
custom_tasks_directory=custom_tasks,
125+
override_batch_size=-1,
126+
num_fewshot_seeds=num_fewshot_seeds,
127+
max_samples=max_samples,
128+
use_chat_template=use_chat_template,
129+
system_prompt=system_prompt,
130+
load_responses_from_details_date_id=load_responses_from_details_date_id,
131+
)
132+
133+
if model_args.endswith(".yaml"):
134+
with open(model_args, "r") as f:
135+
config = yaml.safe_load(f)["model"]
136+
model_args = config["base_params"]["model_args"]
137+
generation_parameters = GenerationParameters.from_dict(config)
138+
else:
139+
generation_parameters = GenerationParameters()
140+
141+
model_args_dict: dict = {k.split("=")[0]: k.split("=")[1] if "=" in k else True for k in model_args.split(",")}
142+
model_config = SGLangModelConfig(**model_args_dict, generation_parameters=generation_parameters)
143+
144+
pipeline = Pipeline(
145+
tasks=tasks,
146+
pipeline_parameters=pipeline_params,
147+
evaluation_tracker=evaluation_tracker,
148+
model_config=model_config,
149+
)
150+
151+
pipeline.evaluate()
152+
153+
pipeline.show_results()
154+
155+
results = pipeline.get_results()
156+
157+
pipeline.save_and_push_results()
158+
159+
return results

src/lighteval/models/model_input.py

+26-11
Original file line numberDiff line numberDiff line change
@@ -27,20 +27,20 @@
2727
@dataclass
2828
class GenerationParameters:
2929
early_stopping: Optional[bool] = None # vllm, transformers
30-
repetition_penalty: Optional[float] = None # vllm, transformers, tgi
31-
frequency_penalty: Optional[float] = None # vllm, tgi
30+
repetition_penalty: Optional[float] = None # vllm, transformers, tgi, sglang
31+
frequency_penalty: Optional[float] = None # vllm, tgi, sglang
3232
length_penalty: Optional[float] = None # vllm, transformers
33-
presence_penalty: Optional[float] = None # vllm
33+
presence_penalty: Optional[float] = None # vllm, sglang
3434

35-
max_new_tokens: Optional[int] = None # vllm, transformers, tgi, litellm
36-
min_new_tokens: Optional[int] = None # vllm, transformers
35+
max_new_tokens: Optional[int] = None # vllm, transformers, tgi, litellm, sglang
36+
min_new_tokens: Optional[int] = None # vllm, transformers, sglang
3737

38-
seed: Optional[int] = None # vllm, tgi litellm
39-
stop_tokens: Optional[list[str]] = None # vllm, transformers, tgi, litellm
40-
temperature: Optional[float] = None # vllm, transformers, tgi, litellm
41-
top_k: Optional[int] = None # vllm, transformers, tgi
42-
min_p: Optional[float] = None # vllm, transformers
43-
top_p: Optional[int] = None # vllm, transformers, tgi, litellm
38+
seed: Optional[int] = None # vllm, tgi, litellm
39+
stop_tokens: Optional[list[str]] = None # vllm, transformers, tgi, litellm, sglang
40+
temperature: Optional[float] = None # vllm, transformers, tgi, litellm, sglang
41+
top_k: Optional[int] = None # vllm, transformers, tgi, sglang
42+
min_p: Optional[float] = None # vllm, transformers, sglang
43+
top_p: Optional[int] = None # vllm, transformers, tgi, litellm, sglang
4444
truncate_prompt: Optional[bool] = None # vllm, tgi
4545

4646
@classmethod
@@ -182,3 +182,18 @@ def to_tgi_ie_dict(self) -> dict:
182182
"truncate": self.truncate_prompt,
183183
}
184184
return {k: v for k, v in args.items() if v is not None}
185+
186+
def to_sglang_dict(self) -> dict:
187+
args = {
188+
"max_new_tokens": self.max_new_tokens,
189+
"temperature": self.temperature,
190+
"stop": self.stop_tokens,
191+
"top_p": self.top_p,
192+
"top_k": self.top_k,
193+
"min_p": self.min_p,
194+
"frequency_penalty": self.frequency_penalty,
195+
"presence_penalty": self.presence_penalty,
196+
"repetition_penalty": self.repetition_penalty,
197+
"min_new_tokens": self.min_new_tokens,
198+
}
199+
return {k: v for k, v in args.items() if v is not None}

src/lighteval/models/model_loader.py

+14
Original file line numberDiff line numberDiff line change
@@ -32,16 +32,19 @@
3232
from lighteval.models.endpoints.openai_model import OpenAIClient, OpenAIModelConfig
3333
from lighteval.models.endpoints.tgi_model import ModelClient, TGIModelConfig
3434
from lighteval.models.litellm_model import LiteLLMClient, LiteLLMModelConfig
35+
from lighteval.models.sglang.sglang_model import SGLangModel, SGLangModelConfig
3536
from lighteval.models.transformers.adapter_model import AdapterModel, AdapterModelConfig
3637
from lighteval.models.transformers.delta_model import DeltaModel, DeltaModelConfig
3738
from lighteval.models.transformers.transformers_model import TransformersModel, TransformersModelConfig
3839
from lighteval.models.vllm.vllm_model import VLLMModel, VLLMModelConfig
3940
from lighteval.utils.imports import (
4041
NO_LITELLM_ERROR_MSG,
42+
NO_SGLANG_ERROR_MSG,
4143
NO_TGI_ERROR_MSG,
4244
NO_VLLM_ERROR_MSG,
4345
is_litellm_available,
4446
is_openai_available,
47+
is_sglang_available,
4548
is_tgi_available,
4649
is_vllm_available,
4750
)
@@ -62,6 +65,7 @@ def load_model( # noqa: C901
6265
VLLMModelConfig,
6366
OpenAIModelConfig,
6467
LiteLLMModelConfig,
68+
SGLangModelConfig,
6569
],
6670
env_config: EnvConfig,
6771
) -> Union[TransformersModel, AdapterModel, DeltaModel, ModelClient, DummyModel]:
@@ -96,6 +100,9 @@ def load_model( # noqa: C901
96100
if isinstance(config, VLLMModelConfig):
97101
return load_model_with_accelerate_or_default(config=config, env_config=env_config)
98102

103+
if isinstance(config, SGLangModelConfig):
104+
return load_sglang_model(config=config, env_config=env_config)
105+
99106
if isinstance(config, OpenAIModelConfig):
100107
return load_openai_model(config=config, env_config=env_config)
101108

@@ -159,3 +166,10 @@ def load_model_with_accelerate_or_default(
159166

160167
def load_dummy_model(config: DummyModelConfig, env_config: EnvConfig):
161168
return DummyModel(config=config, env_config=env_config)
169+
170+
171+
def load_sglang_model(config: SGLangModelConfig, env_config: EnvConfig):
172+
if not is_sglang_available():
173+
raise ImportError(NO_SGLANG_ERROR_MSG)
174+
175+
return SGLangModel(config=config, env_config=env_config)

0 commit comments

Comments
 (0)