Add number for latency comparison (#4612)

This PR adds latency comparison
deepspeedai · Nov 3, 2023 · ff53c22 · ff53c22
1 parent 58d3b65
commit ff53c22
Showing 1 changed file with 2 additions and 0 deletions.
diff --git a/blogs/deepspeed-fastgen/README.md b/blogs/deepspeed-fastgen/README.md
@@ -165,10 +165,12 @@ When vLLM preempts the ongoing generation of previous requests, the generation l
 ### D. Token Level Timing Analysis
 
 Figure 5 displays the P50, P90, and P95 latencies of the generation processes. Both vLLM and DeepSpeed-FlexGen exhibit similar P50 latencies, but vLLM demonstrates significantly higher latencies for P90 and P95.
+Regarding the P95 latencies, DeepSpeed-FlexGen achieved a reduction of 3.7 times.
 
 This discrepancy is due to a noticeable spike in vLLM's generation latency when it preempts the ongoing generation to process new prompts.
 In contrast, DeepSpeed-FastGen typically processes the prompt and generation for previous requests concurrently, leading to much more consistent generation latency.
 
+
 <div align="center">
   <img src="assets/images/token_latency.png" alt="" width="400"/><br>