You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: crates/bpe/README.md
+6-1Lines changed: 6 additions & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -183,7 +183,12 @@ On average it is about ~4 faster, since the short-cuts usually pay off.
183
183
184
184
## Benchmarks
185
185
186
-
We ran several benchmarks to compare performance of different encoders and the [tiktoken-rs](https://crates.io/crates/tiktoken-rs) library (a wrapper around OpenAI's tiktoken implementation):
186
+
We ran several benchmarks to compare performance of different encoders and a tiktoken implementation.
187
+
For the tiktoken implementation we used [tiktoken-rs](https://crates.io/crates/tiktoken-rs) library, a wrapper around OpenAI's tiktoken implementation.
188
+
Note that tiktoken does not run BPE on the full input text.
189
+
Instead it splits it into large chunks using a regex and runs BPE on the individual chunks.
190
+
We have not tried to see if that approach is compatible with our BPE implementation.
191
+
We benchmarked the following scenarios:
187
192
188
193
- The first measures encoding runtime for our different encoders and the tiktoken Rust implementation.
189
194
This shows a ~3.5x performance improvement for our fastest correct encoder compared to the tiktoken library.
0 commit comments