[FEAT]: Specify whisper transcription language #2928

khalilxg · 2025-01-02T18:03:23Z

How are you running AnythingLLM?

Docker (local)

What happened?

I'm encountering an issue with the Whisper integration in AnythingLLM. Despite setting the language parameter to "ar" in the OpenAI Whisper API, the transcription often returns transliterated Arabic (Arabic words in Latin script) instead of Arabic script. I've tried various methods to address this, but none have worked so far.

Expected Behavior: The transcription should return Arabic text in Arabic script (e.g., "مرحبا" for "hello").

Actual Behavior: The transcription returns transliterated Arabic in Latin script (e.g., "marhaban" for "hello").

Environment:

AnythingLLM Version: docker latest
Operating System: debian
Additional Context: I've followed the Whisper documentation and confirmed that the language parameter is set correctly. This issue might be related to how the API processes Arabic audio or interprets the transcription language.

Request for Resolution: Please provide guidance or a workaround to force Whisper to transcribe Arabic speech into Arabic script. If this is a limitation of the current implementation, a feature to enforce script-based output would be appreciated.

Are there known steps to reproduce?

Steps to Reproduce:

Provide an Arabic audio file.
Configure the Whisper transcription with the following parameters:
model: "whisper-1"
language: "ar"
temperature: 0
Check the transcription output.

timothycarambat · 2025-01-02T20:15:10Z

needs appropriate supported language here:

anything-llm/collector/utils/WhisperProviders/localWhisper.js

Line 205 in 480c8b1

const { text } = await transcriber(audioData, {

language: "en", // ISO-code

khalilxg · 2025-01-02T21:08:35Z

im using openAi whisper's api,

target language is arabic, first i've just added language variable but still got same issue,: so i've updated some lines in
anything-llm-master/collector/utils/WhisperProviders/OpenAiWhisper.js
to

`const fs = require("fs");

class OpenAiWhisper {
constructor({ options }) {
const { OpenAI: OpenAIApi } = require("openai");
if (!options.openAiKey) throw new Error("No OpenAI API key was set.");

this.openai = new OpenAIApi({
  apiKey: options.openAiKey,
});
this.model = "whisper-1";
this.temperature = 0;
this.#log("Initialized.");
this.language = "ar";
this.task = "transcribe";

}

#log(text, ...args) {
console.log(\x1b[32m[OpenAiWhisper]\x1b[0m ${text}, ...args);
}

async processFile(fullFilePath) {
return await this.openai.audio.transcriptions
.create({
file: fs.createReadStream(fullFilePath),
model: this.model,
prompt: "مرحبًا، اسمي جو، متحدث أصلي للغة العربية، وسأجري اليوم محادثة باللغة العربية حول موضوع قد تجده مثيرًا للاهتمام للغاية.",
temperature: this.temperature,
language: this.language,
task: this.task,
})
.then((response) => {
if (!response) {
return {
content: "",
error: "No content was able to be transcribed.",
};
}

    return { content: response.text, error: null };
  })
  .catch((error) => {
    this.#log(
      `Could not get any response from openai whisper`,
      error.message
    );
    return { content: "", error: error.message };
  });

}
}

module.exports = {
OpenAiWhisper,
};
`
adding prompt, language, and task, and still got same issue: input voice in pure arabic accent and i get instant latin alphabets

timothycarambat · 2025-01-02T22:17:31Z

In that case: https://platform.openai.com/docs/guides/speech-to-text#prompting

Some languages can be written in different ways, such as simplified or traditional Chinese. The model might not always use the writing style that you want for your transcript by default. You can improve this by using a prompt in your preferred writing style.

The language helps the model determine the input language - not the output. If you add a prompt that is written in Arabic to then specify to output the translation in Arabic that may help but it is not foolproof.

Googling this shows this issue is pretty common among Whisper model users. Most wind up going to post-processing the output with an LLM for translation. So that is the current state of whisper 🤷

khalilxg added the possible bug Bug was reported but is not confirmed or is unable to be replicated. label Jan 2, 2025

timothycarambat added enhancement New feature or request feature request and removed possible bug Bug was reported but is not confirmed or is unable to be replicated. labels Jan 2, 2025

timothycarambat changed the title ~~[BUG]: whisper transcription return only in latin~~ [FEAT]: Specify whisper transcription language Jan 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEAT]: Specify whisper transcription language #2928

[FEAT]: Specify whisper transcription language #2928

khalilxg commented Jan 2, 2025

timothycarambat commented Jan 2, 2025 •

edited

Loading

khalilxg commented Jan 2, 2025

timothycarambat commented Jan 2, 2025

[FEAT]: Specify whisper transcription language #2928

[FEAT]: Specify whisper transcription language #2928

Comments

khalilxg commented Jan 2, 2025

How are you running AnythingLLM?

What happened?

Are there known steps to reproduce?

timothycarambat commented Jan 2, 2025 • edited Loading

khalilxg commented Jan 2, 2025

timothycarambat commented Jan 2, 2025

timothycarambat commented Jan 2, 2025 •

edited

Loading