You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+34-20Lines changed: 34 additions & 20 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,38 +1,52 @@
1
-
# BenchLLM
1
+
# 🏋️♂️ BenchLLM 🏋️♀️
2
2
3
-
BenchLLM is a Python-based open-source library that streamlines the testing process for Large Language Models (LLMs) and AI-powered applications. It offers an intuitive and robust way to validate and score the output of your code with minimal boilerplate or configuration.
3
+
🦾 Continuous Integration for LLM powered applications 🦙🦅🤖
4
4
5
-
BenchLLM is actively used at [V7](https://www.v7labs.com) for improving our LLM applications and now Open Sourced under MIT License to share with the wider community
BenchLLM is a Python-based open-source library that streamlines the testing of Large Language Models (LLMs) and AI-powered applications. It measures the accuracy of your model, agents, or chains by validating responses on any number of tests via LLMs.
10
+
11
+
BenchLLM is actively used at [V7](https://www.v7labs.com) for improving our LLM applications and is now Open Sourced under MIT License to share with the wider community
12
+
13
+
14
+
## 💡 Get help on [Discord](https://discord.gg/x7ExfHb3bG) or [Tweet at us](https://twitter.com/V7Labs)
15
+
16
+
<hr/>
6
17
7
18
Use BenchLLM to:
8
19
9
-
- Easily set up a comprehensive testing suite for your LLMs.
10
-
- Continous integration for your langchain/agents/models.
11
-
- Elimiate flaky chains and create confidence in your code.
20
+
- Test the responses of your LLM across any number of prompts.
21
+
- Continuous integration for chains like [Langchain](https://github.com/hwchase17/langchain), agents like [AutoGPT](https://github.com/Significant-Gravitas/Auto-GPT), or LLM models like [Llama](https://github.com/facebookresearch/llama) or GPT-4.
22
+
- Eliminate flaky chains and create confidence in your code.
23
+
- Spot inaccurate responses and hallucinations in your application at every version.
12
24
13
-
> **NOTE:** BenchLLM is in the early stage of development and will be subject to rapid changes.
25
+
<hr/>
14
26
15
-
For bug reporting, feature requests, or contributions, please open an issue or submit a pull request (PR) on our GitHub page.
27
+
> ⚠️ **NOTE:** BenchLLM is in the early stage of development and will be subject to rapid changes.
28
+
>
29
+
>For bug reporting, feature requests, or contributions, please open an issue or submit a pull request (PR) on our GitHub page.
16
30
17
-
## BenchLLM Testing Methodology
31
+
## 🧪 BenchLLM Testing Methodology
18
32
19
33
BenchLLM implements a distinct two-step methodology for validating your machine learning models:
20
34
21
-
1.**Testing**: This stage involves running your code against various tests and capturing the predictions produced by your model without immediate judgment or comparison.
35
+
1.**Testing**: This stage involves running your code against any number of expected responses and capturing the predictions produced by your model without immediate judgment or comparison.
22
36
23
-
2.**Evaluation**: During this phase, the recorded predictions are compared against the expected output. Detailed comparison reports, including pass/fail status and other metrics, are generated.
37
+
2.**Evaluation**: The recorded predictions are compared against the expected output using LLMs to verify factual similarity (or optionally manually). Detailed comparison reports, including pass/fail status and other metrics, are generated.
24
38
25
39
This methodical separation offers a comprehensive view of your model's performance and allows for better control and refinement of each step.
26
40
27
-
## Install
41
+
## 🚀 Install
28
42
29
43
To install BenchLLM we use pip
30
44
31
45
```
32
46
pip install benchllm
33
47
```
34
48
35
-
## Usage
49
+
## 💻 Usage
36
50
37
51
Start by importing the library and use the @benchllm.test decorator to mark the function you'd like to test:
38
52
@@ -102,9 +116,9 @@ The non interactive evaluators also supports `--workers N` to run in the evaluat
102
116
$ bench run --evaluator string-match --workers 5
103
117
```
104
118
105
-
### Eval
119
+
### 🧮 Eval
106
120
107
-
While bench run runs each test function and then evaluates their output, it can often be beneficial to separate these into two steps. For example, if you want a person to manually do the evaluation or if you want to try multiple evaluation methods on the same function.
121
+
While _bench run_ runs each test function and then evaluates their output, it can often be beneficial to separate these into two steps. For example, if you want a person to manually do the evaluation or if you want to try multiple evaluation methods on the same function.
108
122
109
123
```bash
110
124
$ bench run --no-eval
@@ -117,7 +131,7 @@ Then later you can evaluate them with
117
131
$ bench eval output/latest/predictions
118
132
```
119
133
120
-
## API
134
+
## 🔌 API
121
135
122
136
For more detailed control, BenchLLM provides an API.
123
137
You are not required to add YML/JSON tests to be able to evaluate your model.
@@ -149,14 +163,14 @@ results = evaluator.run()
149
163
print(results)
150
164
```
151
165
152
-
## Commands
166
+
## ☕️ Commands
153
167
154
168
- `bench add`: Add a new test to a suite.
155
169
- `bench tests`: List all tests in a suite.
156
170
- `bench run`: Run all or target test suites.
157
171
- `bench eval`: Runs the evaluation of an existing test run.
158
172
159
-
## Contribute
173
+
## 🙌 Contribute
160
174
161
175
BenchLLM is developed for Python 3.10, although it may work with other Python versions as well. We recommend using a Python 3.10 environment. You can use conda or any other environment manager to set up the environment:
162
176
@@ -180,6 +194,6 @@ Contribution steps:
180
194
4. Test your changes.
181
195
5. Submit a pull request.
182
196
183
-
We adhere to PEP8 style guide. Please follow this guide when contributing.
197
+
We adhere to the PEP8 style guide. Please follow this guide when contributing.
184
198
185
-
For further information and advanced usage, please refer to the comprehensive BenchLLM documentation. If you need any support, feel free to open an issue on our GitHub page.
199
+
If you need any support, feel free to open an issue on our GitHub page.
0 commit comments