<img width="524" height="260" alt="Image" src="https://github.com/user-attachments/assets/9ef3de38-0681-4f57-839b-7d690b4d7552" /> The vertical axis is ms, QPM=2000. Model work on the service is a xlm-roberta-large.onnx with dynamic batch_size and squence_length. Here is the config.pbtxt. `name: "xlm-roberta-large-onnx" backend: "onnxruntime" dynamic_batching { } max_batch_size: 16 input [ { name: "input_ids" data_type: TYPE_INT64 dims: [-1] }, { name: "attention_mask" data_type: TYPE_INT64 dims: [-1] } ] output [ { name: "embeds" data_type: TYPE_FP32 dims: [512] } ] instance_group [ { count: 1 kind: KIND_GPU } ] parameters { key: "execution_mode" value: { string_value: "1" } }`