25 Mar 22:30

xenova

39a75ce

3.4.1 Latest

Latest

What's new?

Add support for SNAC (Multi-Scale Neural Audio Codec) in #1251
Add support for Metric3D (v1 & v2) in #1254
Add support for Gemma 3 text in #1229. Note: Only Node.js execution is supported for now.
Safeguard against background removal pipeline precision issues in #1255. Thanks to @LuSrodri for reporting the issue!
Allow RawImage to read from all types of supported sources by @BritishWerewolf in #1244
Update pipelines.md api docs in #1256
Update extension example to use latest version by @fs-eire in #1213

Full Changelog: 3.4.0...3.4.1

Contributors

BritishWerewolf, fs-eire, and LuSrodri

Assets 2

07 Mar 12:04

xenova

3.4.0

5b5e5ed

3.4.0

🚀 Transformers.js v3.4 — Background Removal Pipeline, Ultravox DAC, Mimi, SmolVLM2, LiteWhisper.

🖼️ Background Removal Pipeline
🤖 New models: Ultravox DAC, Mimi, SmolVLM2, LiteWhisper
🛠️ Other improvements
🤗 New contributors

🖼️ New Background Removal Pipeline

Removing backgrounds from images is now as easy as:

import { pipeline } from "@huggingface/transformers";
const segmenter = await pipeline("background-removal", "onnx-community/BEN2-ONNX");
const output = await segmenter("input.png");
output[0].save("output.png"); // (Optional) Save the image

You can find the full list of compatible models here, which will continue to grow in future! 🔥 For more information, check out #1216.

🤖 New models

Ultravox for audio-text-to-text generation (#1207). See here for the list of supported models.

See example usage

import { UltravoxProcessor, UltravoxModel, read_audio } from "@huggingface/transformers";

const processor = await UltravoxProcessor.from_pretrained(
  "onnx-community/ultravox-v0_5-llama-3_2-1b-ONNX",
);
const model = await UltravoxModel.from_pretrained(
  "onnx-community/ultravox-v0_5-llama-3_2-1b-ONNX",
  {
    dtype: {
      embed_tokens: "q8", // "fp32", "fp16", "q8"
      audio_encoder: "q4", // "fp32", "fp16", "q8", "q4", "q4f16"
      decoder_model_merged: "q4", // "q8", "q4", "q4f16"
    },
  },
);

const audio = await read_audio("http://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/mlk.wav", 16000);
const messages = [
  {
    role: "system",
    content: "You are a helpful assistant.",
  },
  { role: "user", content: "Transcribe this audio:<|audio|>" },
];
const text = processor.tokenizer.apply_chat_template(messages, {
  add_generation_prompt: true,
  tokenize: false,
});

const inputs = await processor(text, audio);
const generated_ids = await model.generate({
  ...inputs,
  max_new_tokens: 128,
});

const generated_texts = processor.batch_decode(
  generated_ids.slice(null, [inputs.input_ids.dims.at(-1), null]),
  { skip_special_tokens: true },
);
console.log(generated_texts[0]);
// "I can transcribe the audio for you. Here's the transcription:\n\n\"I have a dream that one day this nation will rise up and live out the true meaning of its creed.\"\n\n- Martin Luther King Jr.\n\nWould you like me to provide the transcription in a specific format (e.g., word-for-word, character-for-character, or a specific font)?"

DAC and Mimi for audio tokenization/neural audio codecs (#1215). See here for the list of supported DAC models and here for the list of supported Mimi models.

See example usage

DAC:

import { DacModel, AutoFeatureExtractor } from '@huggingface/transformers';

const model_id = "onnx-community/dac_16khz-ONNX";
const model = await DacModel.from_pretrained(model_id);
const feature_extractor = await AutoFeatureExtractor.from_pretrained(model_id);

const audio_sample = new Float32Array(12000);

// pre-process the inputs
const inputs = await feature_extractor(audio_sample);
{
    // explicitly encode then decode the audio inputs
    const encoder_outputs = await model.encode(inputs);
    const { audio_values } = await model.decode(encoder_outputs);
    console.log(audio_values);
}

{
    // or the equivalent with a forward pass
    const { audio_values } = await model(inputs);
    console.log(audio_values);
}

Mimi:

import { MimiModel, AutoFeatureExtractor } from '@huggingface/transformers';

const model_id = "onnx-community/kyutai-mimi-ONNX";
const model = await MimiModel.from_pretrained(model_id);
const feature_extractor = await AutoFeatureExtractor.from_pretrained(model_id);

const audio_sample = new Float32Array(12000);

// pre-process the inputs
const inputs = await feature_extractor(audio_sample);
{
    // explicitly encode then decode the audio inputs
    const encoder_outputs = await model.encode(inputs);
    const { audio_values } = await model.decode(encoder_outputs);
    console.log(audio_values);
}

{
    // or the equivalent with a forward pass
    const { audio_values } = await model(inputs);
    console.log(audio_values);
}

SmolVLM2, a lightweight multimodal model designed to analyze image and video content (#1196). See here for the list of supported models. Usage is identical to SmolVLM.
LiteWhisper for automatic speech recognition (#1219). See here for the list of supported models. Usage is identical to Whisper.

🛠️ Other improvements

Add support for multi-chunk external data files in #1212
Fix package export by @fs-eire in #1161
Add NFD normalizer in #1211. Thanks to @adewdev for reporting!
Documentation improvements by @viksit in #1184
Optimize conversion script in #1204 and #1218
Use Float16Array instead of Uint16Array for kvcache when available in #1208

🤗 New contributors

@axrati made their first contribution in #602
@viksit made their first contribution in #1184
@tangkunyin made their first contribution in #1203

Full Changelog: 3.3.3...3.4.0

Contributors

viksit, tangkunyin, and 3 other contributors

Assets 2

06 Feb 23:33

xenova

3.3.3

829ace0

3.3.3

What's new?

Bump onnxruntime-web and @huggingface/jinja in #1183.

Full Changelog: 3.3.2...3.3.3

Assets 2

22 Jan 15:13

xenova

3.3.2

6f43f24

3.3.2

What's new?

Add support for Helium and Glm in #1156
Improve build process and fix usage with certain bundlers in #1158
Auto-detect wordpiece tokenizer when model.type is missing in #1151
Update Moonshine config values for transformers v4.48.0 in #1155
Support simultaneous tensor op execution in WASM in #1162
Update react tutorial sample code in #1152

Full Changelog: 3.3.1...3.3.2

Assets 2

15 Jan 15:36

xenova

3.3.1

e1753ac

3.3.1

What's new?

hotfix: Copy missing ort-wasm-simd-threaded.jsep.mjs to dist folder (#1150)

Full Changelog: 3.3.0...3.3.1

Assets 2

15 Jan 13:28

xenova

3.3.0

e00ff3b

3.3.0

🔥 Transformers.js v3.3 — StyleTTS 2 (Kokoro) for state-of-the-art text-to-speech, Grounding DINO for zero-shot object detection

🤖 New models: StyleTTS 2, Grounding Dino
- StyleTTS 2: High-quality speech synthesis
- Grounding DINO: Zero-shot object detection
🛠️ Other improvements
🤗 New contributors

🤖 New models: StyleTTS 2, Grounding DINO

StyleTTS 2 for high-quality speech synthesis

See #1148 for more information and here for the list of supported models.

First, install the kokoro-js library, which uses Transformers.js, from NPM using:

npm i kokoro-js

You can then generate speech as follows:

import { KokoroTTS } from "kokoro-js";

const model_id = "onnx-community/Kokoro-82M-ONNX";
const tts = await KokoroTTS.from_pretrained(model_id, {
  dtype: "q8", // Options: "fp32", "fp16", "q8", "q4", "q4f16"
});

const text = "Life is like a box of chocolates. You never know what you're gonna get.";
const audio = await tts.generate(text, {
  // Use `tts.list_voices()` to list all available voices
  voice: "af_bella",
});
audio.save("audio.wav");

Grounding DINO for zero-shot object detection

See #1137 for more information and here for the list of supported models.

Example: Zero-shot object detection with onnx-community/grounding-dino-tiny-ONNX using the pipeline API.

import { pipeline } from "@huggingface/transformers";

const detector = await pipeline("zero-shot-object-detection", "onnx-community/grounding-dino-tiny-ONNX");

const url = "http://images.cocodataset.org/val2017/000000039769.jpg";
const candidate_labels = ["a cat."];
const output = await detector(url, candidate_labels, {
  threshold: 0.3,
});

See example output

[
  { score: 0.45316222310066223, label: "a cat", box: { xmin: 343, ymin: 23, xmax: 637, ymax: 372 } },
  { score: 0.36190420389175415, label: "a cat", box: { xmin: 12, ymin: 52, xmax: 317, ymax: 472 } },
]

🛠️ Other improvements

Add the RawAudio class by @Th3G33k in #682
Update React guide for v3 by @sroussey in #1128
Add option to skip special tokens in TextStreamer by @sroussey in #1139

🤗 New contributors

@sroussey made their first contribution in #1128

Full Changelog: 3.2.4...3.3.0

Contributors

sroussey and Th3G33k

Assets 2

28 Dec 12:03

xenova

3.2.4

307a490

3.2.4

What's new?

Add support for visualizing self-attention heatmaps in #1117

Example code

import { AutoProcessor, AutoModelForImageClassification, interpolate_4d, RawImage } from "@huggingface/transformers";

// Load model and processor
const model_id = "onnx-community/dinov2-with-registers-small-with-attentions";
const model = await AutoModelForImageClassification.from_pretrained(model_id);
const processor = await AutoProcessor.from_pretrained(model_id);

// Load image from URL
const image = await RawImage.read("https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/cats.jpg");

// Pre-process image
const inputs = await processor(image);

// Perform inference
const { logits, attentions } = await model(inputs);

// Get the predicted class
const cls = logits[0].argmax().item();
const label = model.config.id2label[cls];
console.log(`Predicted class: ${label}`);

// Set config values
const patch_size = model.config.patch_size;
const [width, height] = inputs.pixel_values.dims.slice(-2);
const w_featmap = Math.floor(width / patch_size);
const h_featmap = Math.floor(height / patch_size);
const num_heads = model.config.num_attention_heads;
const num_cls_tokens = 1;
const num_register_tokens = model.config.num_register_tokens ?? 0;

// Visualize attention maps
const selected_attentions = attentions
    .at(-1) // we are only interested in the attention maps of the last layer
    .slice(0, null, 0, [num_cls_tokens + num_register_tokens, null])
    .view(num_heads, 1, w_featmap, h_featmap);

const upscaled = await interpolate_4d(selected_attentions, {
    size: [width, height],
    mode: "nearest",
});

for (let i = 0; i < num_heads; ++i) {
    const head_attentions = upscaled[i];
    const minval = head_attentions.min().item();
    const maxval = head_attentions.max().item();
    const image = RawImage.fromTensor(
        head_attentions
            .sub_(minval)
            .div_(maxval - minval)
            .mul_(255)
            .to("uint8"),
    );
    await image.save(`attn-head-${i}.png`);
}

Add min, max, argmin, argmax tensor ops for dim=null
Add support for nearest-neighbour interpolation in interpolate_4d
Depth Estimation pipeline improvements (faster & returns resized depth map)
TypeScript improvements by @ocavue and @shrirajh in #1081 and #1122
Remove unused imports from tokenizers.js by @pratapvardhan in #1116

New Contributors

@shrirajh made their first contribution in #1122
@pratapvardhan made their first contribution in #1116

Full Changelog: 3.2.3...3.2.4

Contributors

pratapvardhan, shrirajh, and ocavue

Assets 2

25 Dec 10:41

xenova

3.2.3

8e075f4

3.2.3

What's new?

Fix setting of model_file_name for image feature extraction pipeline in #1114. Thanks @xitanggg for reporting the issue!

Add support for dinov2 with registers in #1110. Example usage:

import { pipeline } from '@huggingface/transformers';

// Create image classification pipeline
const classifier = await pipeline('image-classification', 'onnx-community/dinov2-with-registers-small-imagenet1k-1-layer');

// Classify an image
const url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/cats.jpg';
const output = await classifier(url);
console.log(output);
// [
//   { label: 'tabby, tabby cat', score: 0.8135351538658142 },
//   { label: 'tiger cat', score: 0.08967583626508713 },
//   { label: 'Egyptian cat', score: 0.06800546497106552 },
//   { label: 'radiator', score: 0.003501888597384095 },
//   { label: 'quilt, comforter, comfort, puff', score: 0.003408448537811637 },
// ]

Full Changelog: 3.2.2...3.2.3

Contributors

xitanggg

Assets 2

23 Dec 15:05

xenova

3.2.2

da2c1e9

3.2.2

What's new?

Fix env.backends.onnx.wasm.proxy = true: Clone tensor if using onnx wasm proxy in #1108

Full Changelog: 3.2.1...3.2.2

Assets 2

19 Dec 17:02

xenova

3.2.1

074e97a

3.2.1

What's new?

Add support for ModernBert in #1104. Check out the blog post for more information!

Example:

import { pipeline } from '@huggingface/transformers';

const pipe = await pipeline('fill-mask', 'answerdotai/ModernBERT-base');
const answer = await pipe('The capital of France is [MASK].');
console.log(answer);

Full Changelog: 3.2.0...3.2.1

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's new?

Contributors

🚀 Transformers.js v3.4 — Background Removal Pipeline, Ultravox DAC, Mimi, SmolVLM2, LiteWhisper.

🖼️ New Background Removal Pipeline

🤖 New models

🛠️ Other improvements

🤗 New contributors

Contributors

What's new?

What's new?

What's new?

🔥 Transformers.js v3.3 — StyleTTS 2 (Kokoro) for state-of-the-art text-to-speech, Grounding DINO for zero-shot object detection

🤖 New models: StyleTTS 2, Grounding DINO

StyleTTS 2 for high-quality speech synthesis

Grounding DINO for zero-shot object detection

🛠️ Other improvements

🤗 New contributors

Contributors

What's new?

New Contributors

Contributors

What's new?

Contributors

What's new?

What's new?

Releases: huggingface/transformers.js

3.4.1

What's new?

Contributors

3.4.0

🚀 Transformers.js v3.4 — Background Removal Pipeline, Ultravox DAC, Mimi, SmolVLM2, LiteWhisper.

🖼️ New Background Removal Pipeline

🤖 New models

🛠️ Other improvements

🤗 New contributors

Contributors

3.3.3

What's new?

3.3.2

What's new?

3.3.1

What's new?

3.3.0

🔥 Transformers.js v3.3 — StyleTTS 2 (Kokoro) for state-of-the-art text-to-speech, Grounding DINO for zero-shot object detection

🤖 New models: StyleTTS 2, Grounding DINO

StyleTTS 2 for high-quality speech synthesis

Grounding DINO for zero-shot object detection

🛠️ Other improvements

🤗 New contributors

Contributors

3.2.4

What's new?

New Contributors

Contributors

3.2.3

What's new?

Contributors

3.2.2

What's new?

3.2.1

What's new?