You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
/// The language of the input audio. Supplying the input language in ISO-639-1 format will improve accuracy and latency.
41
42
#[serde(skip_serializing_if = "Option::is_none")]
42
43
publanguage:Option<String>,
44
+
/// Controls how the audio is cut into chunks. When set to "auto", the server first normalizes loudness and then uses voice activity detection (VAD) to choose boundaries. server_vad object can be provided to tweak VAD detection parameters manually. If unset, the audio is transcribed as a single block.
/// An optional text to guide the model's style or continue a previous audio segment. The prompt should match the audio language.
44
48
#[serde(skip_serializing_if = "Option::is_none")]
45
49
pubprompt:Option<String>,
46
50
/// The format of the transcript output, in one of these options: json, text, srt, verbose_json, or vtt.
47
51
#[serde(skip_serializing_if = "Option::is_none")]
48
52
pubresponse_format:Option<AudioOutputFormat>,
53
+
/// If set to true, the model response data will be streamed to the client as it is generated using server-sent events. Note: Streaming is not supported for the whisper-1 model and will be ignored.
54
+
#[serde(skip_serializing_if = "Option::is_none")]
55
+
pubstream:Option<bool>,
49
56
/// The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random,
50
57
/// while lower values like 0.2 will make it more focused and deterministic.
51
58
/// If set to 0, the model will use log probability to automatically increase the temperature until certain thresholds are hit.
/// Must be set to "server_vad" to enable manual chunking using server side VAD.
175
+
pubr#type:VadConfigType,
176
+
/// Amount of audio to include before the VAD detected speech (in milliseconds).
177
+
pubprefix_padding_ms:Option<usize>,
178
+
/// Duration of silence to detect speech stop (in milliseconds). With shorter values the model will respond more quickly, but may jump in on short pauses from the user.
179
+
pubsilence_duration_ms:Option<usize>,
180
+
/// Sensitivity threshold (0.0 to 1.0) for voice activity detection. A higher threshold will require louder audio to activate the model, and thus might perform better in noisy environments.
0 commit comments