Description
The benchmark currently passes the temperature parameter to all models by default.
However, reasoning models such as gpt-5 do not support custom temperature values and only allow the default value. As a result, evaluation fails with errors like:
Error code: 400 - Unsupported value: 'temperature' does not support 0.0 with this model.
gpt-5 is used as model by default.
Expected behavior
Do not send temperature (and similar sampling parameters) to reasoning models.
Description
The benchmark currently passes the temperature parameter to all models by default.
However, reasoning models such as gpt-5 do not support custom temperature values and only allow the default value. As a result, evaluation fails with errors like:
Error code: 400 - Unsupported value: 'temperature' does not support 0.0 with this model.
gpt-5 is used as model by default.
Expected behavior
Do not send temperature (and similar sampling parameters) to reasoning models.