Skip to content

Releases: huggingface/transformers.js

2.7.0

23 Oct 15:52
Compare
Choose a tag to compare

What's new?

🗣️ New task: Text to speech/audio

Due to popular demand, we've added text-to-speech support to Transformers.js! 😍

TTS.waveform.mp4

You can get started in just a few lines of code!

import { pipeline } from '@xenova/transformers';

let speaker_embeddings = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/speaker_embeddings.bin';
let synthesizer = await pipeline('text-to-speech', 'Xenova/speecht5_tts', { quantized: false });
let out = await synthesizer('Hello, my dog is cute', { speaker_embeddings });
// {
//   audio: Float32Array(26112) [-0.00005657337896991521, 0.00020583874720614403, ...],
//   sampling_rate: 16000
// }

You can then save the audio to a .wav file with the wavefile package:

import wavefile from 'wavefile';
import fs from 'fs';

let wav = new wavefile.WaveFile();
wav.fromScratch(1, out.sampling_rate, '32f', out.audio);
fs.writeFileSync('out.wav', wav.toBuffer());

Alternatively, you can play the file in your browser (see below).

Don't like the speaker's voice? Well, you can choose another from the >7000 speaker embeddings in the CMU Arctic dataset (see here)!

Note: currently, we only support TTS w/ speecht5, but in future we'll add others like bark and MMS!

🖥️ TTS demo and example app

To showcase the power of in-browser TTS, we're also releasing a simple example app (demo, code). Feel free to make improvements to it... and if you do (or end up building your own), please tag me on Twitter! 🤗

TTS.demo.mp4

Misc. changes

  • Update falcon tokenizer in #344
  • Add more links to example section in #343
  • Improve electron example template in #342
  • Update example app dependencies in #347
  • Do not post-process < and > symbols generated from docs in #335

Full Changelog: 2.6.2...2.7.0

2.6.2

27 Sep 14:14
Compare
Choose a tag to compare

What's new?

📝 New task: Document Question Answering

Document Question Answering is the task of answering questions based on an image of a document. Document Question Answering models take a (document, question) pair as input and return an answer in natural language. Check out the docs for more info!

image

Example code
// npm i @xenova/transformers
import { pipeline } from '@xenova/transformers';

let image = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/invoice.png';
let question = 'What is the invoice number?';

// Create document question answering pipeline
let qa_pipeline = await pipeline('document-question-answering', 'Xenova/donut-base-finetuned-docvqa');

// Run the pipeline
let output = await qa_pipeline(image, question);
// [{ answer: 'us-001' }]

🤖 New models

  • Add support for DonutSwin models in #320
  • Add support for Blenderbot and BlenderbotSmall in #292
  • Add support for LongT5 models #316

💻 New example application

  • In-browser semantic image search in #326 (demo, code, tweet)

    semantic-image-search-client.mp4

🐛 Misc. improvements

  • Fixing more _call LSP errors + extra typings by @kungfooman in #304
  • Remove CustomCache requirement for example browser extension project in #325

Full Changelog: 2.6.1...2.6.2

2.6.1

18 Sep 13:40
Compare
Choose a tag to compare

What's new?

  • Add Vanilla JavaScript tutorial by @perborgen in #271. This includes an interactive video tutorial ("scrim"), which walks you through the code! Let us know if you want to see more of these video tutorials! 🤗

    image

  • Add support for min_length and min_new_tokens generation parameters in #308

  • Fix issues with minification in #307

  • Fix ByteLevel pretokenizer and improve whisper test cases in #287

  • Misc. documentation improvements by @rubiagatra in #293

New Contributors

Full Changelog: 2.6.0...2.6.1

2.6.0

08 Sep 15:27
Compare
Choose a tag to compare

What's new?

🤯 14 new architectures

In this release, we've added a ton of new architectures: BLOOM, MPT, BeiT, CamemBERT, CodeLlama, GPT NeoX, GPT-J, HerBERT, mBART, mBART-50, OPT, ResNet, WavLM, and XLM. This brings the total number of supported architectures up to 46! Here's some example code to help you get started:

  • Text-generation with MPT (models):

    import { pipeline } from '@xenova/transformers';
    const generator = await pipeline('text-generation', 'Xenova/ipt-350m', {
        quantized: false, // using unquantized to ensure it matches python version
    });
    
    const output = await generator('La nostra azienda');
    // { generated_text: "La nostra azienda è specializzata nella vendita di prodotti per l'igiene orale e per la salute." }

    Other text-generation models: BLOOM, GPT-NeoX, CodeLlama, GPT-J, OPT.

  • CamemBERT for masked language modelling, text classification, token classification, question answering, and feature extraction (models). For example:

    import { pipeline } from '@xenova/transformers';
    let pipe = await pipeline('token-classification', 'Xenova/camembert-ner-with-dates');
    let output = await pipe("Je m'appelle jean-baptiste et j'habite à montréal depuis fevr 2012");
    // [
    //   { entity: 'I-PER', score: 0.9258053302764893, index: 5, word: 'jean' },
    //   { entity: 'I-PER', score: 0.9048717617988586, index: 6, word: '-' },
    //   { entity: 'I-PER', score: 0.9227054119110107, index: 7, word: 'ba' },
    //   { entity: 'I-PER', score: 0.9385354518890381, index: 8, word: 'pt' },
    //   { entity: 'I-PER', score: 0.9139659404754639, index: 9, word: 'iste' },
    //   { entity: 'I-LOC', score: 0.9877734780311584, index: 15, word: 'montré' },
    //   { entity: 'I-LOC', score: 0.9891639351844788, index: 16, word: 'al' },
    //   { entity: 'I-DATE', score: 0.9858269691467285, index: 18, word: 'fe' },
    //   { entity: 'I-DATE', score: 0.9780661463737488, index: 19, word: 'vr' },
    //   { entity: 'I-DATE', score: 0.980688214302063, index: 20, word: '2012' }
    // ]

    image

  • WavLM for feature-extraction (models). For example:

    import { AutoProcessor, AutoModel, read_audio } from '@xenova/transformers';
    
    // Read and preprocess audio
    const processor = await AutoProcessor.from_pretrained('Xenova/wavlm-base');
    const audio = await read_audio('https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/jfk.wav', 16000);
    const inputs = await processor(audio);
    
    // Run model with inputs
    const model = await AutoModel.from_pretrained('Xenova/wavlm-base');
    const output = await model(inputs);
    // {
    //   last_hidden_state: Tensor {
    //     dims: [ 1, 549, 768 ],
    //     type: 'float32',
    //     data: Float32Array(421632) [-0.349443256855011, -0.39341306686401367,  0.022836603224277496, ...],
    //     size: 421632
    //   }
    // }
  • MBart +MBart50 for multilingual translation (models). For example:

    import { pipeline } from '@xenova/transformers';
    let translator = await pipeline('translation', 'Xenova/mbart-large-50-many-to-many-mmt');
    let output = await translator('संयुक्त राष्ट्र के प्रमुख का कहना है कि सीरिया में कोई सैन्य समाधान नहीं है', {
      src_lang: 'hi_IN', // Hindi
      tgt_lang: 'fr_XX', // French
    });
    // [{ translation_text: 'Le chef des Nations affirme qu 'il n 'y a military solution in Syria.' }]

    See here for the full list of languages and their corresponding codes.

  • BeiT for image classification (models):

    import { pipeline } from '@xenova/transformers';
    let url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/tiger.jpg';
    let pipe = await pipeline('image-classification', 'Xenova/beit-base-patch16-224');
    let output = await pipe(url);
    // [{ label: 'tiger, Panthera tigris', score: 0.7168469429016113 }]
  • ResNet for image classification (models):

    import { pipeline } from '@xenova/transformers';
    let url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/tiger.jpg';
    let pipe = await pipeline('image-classification', 'Xenova/resnet-50');
    let output = await pipe(url);
    // [{ label: 'tiger, Panthera tigris', score: 0.7576608061790466 }]

😍 Over 150 newly-converted models

To get started with these new architectures (and expand coverage for other models), we're releasing over 150 new models on the Hugging Face Hub! Check out the full list here.

image

🏋️ HUGE reduction in model sizes (up to -40%)

Thanks to a recent update of 🤗 Optimum, we were able to remove duplicate weights across various models. In some cases, like whisper-tiny's decoder, this resulted in a 40% reduction in size! Here are some improvements we saw:

  • Whisper-tiny decoder: 50MB → 30MB (-40%)
  • NLLB decoder: 732MB → 476MB (-35%)
  • bloom: 819MB → 562MB (-31%)
  • T5 decoder: 59MB → 42MB (-28%)
  • distilbert-base: 91MB → 68MB (-25%)
  • bart-base decoder: 207MB → 155MB (-25%)
  • roberta-base: 165MB → 126MB (-24%)
  • gpt2: 167MB → 127MB (-24%)
  • bert-base: 134MB → 111MB (-17%)
  • many more!

Play around with some of the smaller whisper models (for automatic speech recognition) here!

whisper-smaller-models

Other

  • Transformers.js integration with LangChain JS (docs)

    import { HuggingFaceTransformersEmbeddings } from "langchain/embeddings/hf_transformers";
    
    const model = new HuggingFaceTransformersEmbeddings({
      modelName: "Xenova/all-MiniLM-L6-v2",
    });
    
    /* Embed queries */
    const res = await model.embedQuery(
      "What would be a good company name for a company that makes colorful socks?"
    );
    console.log({ res });
    /* Embed documents */
    const documentRes = await model.embedDocuments(["Hello world", "Bye bye"]);
    console.log({ documentRes });
  • Refactored PreTrainedModel to require significantly less code when adding new models

  • Typing improvements by @kungfooman

2.5.4

28 Aug 19:06
Compare
Choose a tag to compare

What's new?

  • Add support for 3 new vision architectures (Swin, DeiT, Yolos) in #262. Check out the Hugging Face Hub to see which models you can use!
    • Swin for image classification. e.g.:
      let url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/tiger.jpg';
      let classifier = await pipeline('image-classification', 'Xenova/swin-base-patch4-window7-224-in22k');
      let output = await classifier(url, { topk: null });
      // [
      //   { label: 'Bengal_tiger', score: 0.2258443683385849 },
      //   { label: 'tiger, Panthera_tigris', score: 0.21161635220050812 },
      //   { label: 'predator, predatory_animal', score: 0.09135803580284119 },
      //   { label: 'tigress', score: 0.08038495481014252 },
      //   // ... 21838 more items
      // ]
    • DeiT for image classification. e.g.,:
      let url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/tiger.jpg';
      let classifier = await pipeline('image-classification', 'Xenova/deit-tiny-distilled-patch16-224');
      let output = await classifier(url);
      // [{ label: 'tiger, Panthera tigris', score: 0.9804046154022217 }]
    • Yolos for object detection. e.g.,:
      let url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/cats.jpg';
      let detector = await pipeline('object-detection', 'Xenova/yolos-small-300');
      let output = await detector(url);
      // [
      //   { label: 'remote', score: 0.9837935566902161, box: { xmin: 331, ymin: 80, xmax: 367, ymax: 192 } },
      //   { label: 'cat', score: 0.94994056224823, box: { xmin: 8, ymin: 57, xmax: 316, ymax: 470 } },
      //   { label: 'couch', score: 0.9843178987503052, box: { xmin: 0, ymin: 0, xmax: 639, ymax: 474 } },
      //   { label: 'remote', score: 0.9704685211181641, box: { xmin: 39, ymin: 71, xmax: 179, ymax: 114 } },
      //   { label: 'cat', score: 0.9921762943267822, box: { xmin: 339, ymin: 17, xmax: 642, ymax: 380 } }
      // ]
  • Documentation improvements by @perborgen in #261

New contributors 🤗

Full Changelog: 2.5.3...2.5.4

2.5.3

22 Aug 21:52
Compare
Choose a tag to compare

What's new?

  • Fix whisper timestamps for non-English languages in #253
  • Fix caching for some LFS files from the Hugging Face Hub in #251
  • Improve documentation (w/ example code and links) in #255 and #257. Thanks @josephrocca for helping with this!

New contributors 🤗

Full Changelog: 2.5.2...2.5.3

2.5.2

14 Aug 21:25
Compare
Choose a tag to compare

What's new?

  • Add audio-classification with MMS and Wav2Vec2 in #220. Example usage:
    // npm i @xenova/transformers
    import { pipeline } from '@xenova/transformers';
    
    // Create audio classification pipeline
    let classifier = await pipeline('audio-classification', 'Xenova/mms-lid-4017');
    
    // Run inference
    let url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/jeanNL.wav';
    let output = await classifier(url);
    // [
    //   { label: 'fra', score: 0.9995712041854858 },
    //   { label: 'hat', score: 0.00003788191679632291 },
    //   { label: 'lin', score: 0.00002646935718075838 },
    //   { label: 'hun', score: 0.000015628289474989288 },
    //   { label: 'bre', score: 0.000007014674793026643 }
    // ]
  • Adds automatic-speech-recognition for Wav2Vec2 models in #220 (MMS coming soon).
  • Add support for multi-label classification problem type in #249. Thanks @KiterWork for reporting!
  • Add M2M100 tokenizer in #250. Thanks @AAnirudh07 for the feature request!
  • Documentation improvements

New Contributors

Full Changelog: 2.5.1...2.5.2

2.5.1

09 Aug 20:48
Compare
Choose a tag to compare

What's new?

  • Add support for Llama/Llama2 models in #232
  • Tokenization performance improvements in #234 (+ The Tokenizer Playground example app)
  • Add support for DeBERTa/DeBERTa-v2 models in #244
  • Documentation improvements for zero-shot-classification pipeline (link)

Full Changelog: 2.5.0...2.5.1

2.5.0

01 Aug 13:08
Compare
Choose a tag to compare

What's new?

Support for computing CLIP image and text embeddings separately (#227)

You can now compute CLIP text and vision embeddings separately, allowing for faster inference when you only need to query one of the modalities. We've also released a demo application for semantic image search to showcase this functionality.
image

Example: Compute text embeddings with CLIPTextModelWithProjection.

import { AutoTokenizer, CLIPTextModelWithProjection } from '@xenova/transformers';

// Load tokenizer and text model
const tokenizer = await AutoTokenizer.from_pretrained('Xenova/clip-vit-base-patch16');
const text_model = await CLIPTextModelWithProjection.from_pretrained('Xenova/clip-vit-base-patch16');

// Run tokenization
let texts = ['a photo of a car', 'a photo of a football match'];
let text_inputs = tokenizer(texts, { padding: true, truncation: true });

// Compute embeddings
const { text_embeds } = await text_model(text_inputs);
// Tensor {
//   dims: [ 2, 512 ],
//   type: 'float32',
//   data: Float32Array(1024) [ ... ],
//   size: 1024
// }

Example: Compute vision embeddings with CLIPVisionModelWithProjection.

import { AutoProcessor, CLIPVisionModelWithProjection, RawImage} from '@xenova/transformers';

// Load processor and vision model
const processor = await AutoProcessor.from_pretrained('Xenova/clip-vit-base-patch16');
const vision_model = await CLIPVisionModelWithProjection.from_pretrained('Xenova/clip-vit-base-patch16');

// Read image and run processor
let image = await RawImage.read('https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/football-match.jpg');
let image_inputs = await processor(image);

// Compute embeddings
const { image_embeds } = await vision_model(image_inputs);
// Tensor {
//   dims: [ 1, 512 ],
//   type: 'float32',
//   data: Float32Array(512) [ ... ],
//   size: 512
// }

Improved browser extension example/template (#196)

We've updated the source code for our example browser extension, making the following improvements:

  1. Custom model caching - meaning you don't need to ship the weights of the model with the extension. In addition to a smaller bundle size, when the user updates, they won't need to redownload the weights!
  2. Use ES6 module syntax (vs. CommonJS) - much cleaner code!
  3. Persistent service worker - fixed an issue where the service worker would go to sleep after a portion of inactivity.

Summary of updates since last minor release (2.4.0):

Misc bug fixes and improvements

  • Fixed floating-point-precision edge-case for resizing images
  • Fixed RawImage.save()
  • BPE tokenization for weird whitespace characters (#208)

2.4.4

28 Jul 11:59
Compare
Choose a tag to compare

What's new?

Full Changelog: 2.4.3...2.4.4