Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
79 changes: 79 additions & 0 deletions demos/image_generation/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -474,6 +474,85 @@ Output file (`output2.png`):
![output2](./output2.png)


## Measuring throughput
To increase throughput in image generation scenarios, it is worth changing plugin config and increase NUM_STREAMS. Additionally, set up static shape for the model to avoid dynamic shape overhead. This can be done by setting `resolution` parameter in the request.

Edit graph.pbtxt and restart the server:
```
input_stream: "HTTP_REQUEST_PAYLOAD:input"
output_stream: "HTTP_RESPONSE_PAYLOAD:output"

node: {
name: "ImageGenExecutor"
calculator: "ImageGenCalculator"
input_stream: "HTTP_REQUEST_PAYLOAD:input"
input_side_packet: "IMAGE_GEN_NODE_RESOURCES:pipes"
output_stream: "HTTP_RESPONSE_PAYLOAD:output"
node_options: {
[type.googleapis.com / mediapipe.ImageGenCalculatorOptions]: {
models_path: "./"
device: "CPU"
num_images_per_prompt: 4 # 4 images per inference request
resolution: "512x512" # reshape to static value
plugin_config: '{"PERFORMANCE_HINT":"THROUGHPUT","NUM_STREAMS":8}'
}
}
}
```

Prepare example request `input_data.json`:
```
{
"data": [
{
"payload": [
{
"model": "OpenVINO/stable-diffusion-v1-5-int8-ov",
"prompt": "dog",
"num_inference_steps": 50
}
]
}
]
}

```

Run benchmark:
```bash
docker run --rm -it --net=host -v $(pwd):/work:rw nvcr.io/nvidia/tritonserver:24.12-py3-sdk \
perf_analyzer \
-m OpenVINO/stable-diffusion-v1-5-int8-ov \
--input-data=/work/input_data.json \
--service-kind=openai \
--endpoint=v3/images/generations \
--async \
-u localhost:8000 \
--request-count 16 \
--concurrency-range 16
```

```
*** Measurement Settings ***
Service Kind: OPENAI
Sending 16 benchmark requests
Using asynchronous calls for inference

Request concurrency: 16
Client:
Request count: 16
Throughput: 0.0999919 infer/sec
Avg latency: 156783666 usec (standard deviation 1087845 usec)
p50 latency: 157110315 usec
p90 latency: 158720060 usec
p95 latency: 158720060 usec
p99 latency: 159494095 usec
Avg HTTP time: 156783654 usec (send/recv 8717 usec + response wait 156774937 usec)
Inferences/Second vs. Client Average Batch Latency
Concurrency: 16, throughput: 0.0999919 infer/sec, latency 156783666 usec
```

0.0999919 infer/sec meaning 0.4 images per second considering 4 images per prompt.


## References
Expand Down
5 changes: 5 additions & 0 deletions docs/image_generation/reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,11 @@ The calculator supports the following `node_options` for tuning the pipeline con
- `optional uint64 default_num_inference_steps` - default number of inference steps used for generation, if not specified by the request [default = 50];
- `optional uint64 max_num_inference_steps` - maximum number of inference steps allowed for generation. Requests exceeding this value will be rejected. [default = 100];

Static model resolution settings:
- `optional string resolution` - enforces static resolution for all requests. When specified, underlying models are reshaped to this resolution.
- `optional uint64 num_images_per_prompt` - used together with max_resolution, to define batch size in static model shape.
- `optional float guidance_scale` - used together with max_resolution


## Models Directory

Expand Down