What's new?

This release is a follow-up to #485, with additional intellisense-focused improvements (see PR).

Full Changelog: 2.13.1...2.13.2

Assets 2

03 Jan 11:24

xenova

2.13.1

e8d1236

2.13.1

What's new?

Improve typing of pipeline function in #485. Thanks to @wesbos for the suggestion!

This also means when you hover over the class name, you'll get example code to help you out.

Add phi-1_5 model in #493.

See example code

import { pipeline } from '@xenova/transformers';

// Create a text-generation pipeline
const generator = await pipeline('text-generation', 'Xenova/phi-1_5_dev');

// Construct prompt
const prompt = `\`\`\`py
import math
def print_prime(n):
    """
    Print all primes between 1 and n
    """`;

// Generate text
const result = await generator(prompt, {
  max_new_tokens: 100,
});
console.log(result[0].generated_text);

Results in:

import math
def print_prime(n):
    """
    Print all primes between 1 and n
    """
    primes = []
    for num in range(2, n+1):
        is_prime = True
        for i in range(2, int(math.sqrt(num))+1):
            if num % i == 0:
                is_prime = False
                break
        if is_prime:
            primes.append(num)
    print(primes)

print_prime(20)

Running the code produces the correct result:

[2, 3, 5, 7, 11, 13, 17, 19]

Full Changelog: 2.13.0...2.13.1

Contributors

wesbos

Assets 2

27 Dec 15:00

xenova

2.13.0

61459e3

2.13.0

What's new?

🎄 7 new architectures!

This release adds support for many new multimodal architectures, bringing the total number of supported architectures to 80! 🤯

1. VITS for multilingual text-to-speech across over 1000 languages! (#466)

import { pipeline } from '@xenova/transformers';

// Create English text-to-speech pipeline
const synthesizer = await pipeline('text-to-speech', 'Xenova/mms-tts-eng');

// Generate speech
const output = await synthesizer('I love transformers');
// {
//   audio: Float32Array(26112) [...],
//   sampling_rate: 16000
// }

mms-tts-eng.mp4

See here for the list of available models. To start, we've converted 12 of the ~1140 models on the Hugging Face Hub. If we haven't added the one you wish to use, you can make it web-ready using our conversion script.

2. CLIPSeg for zero-shot image segmentation. (#478)

import { AutoTokenizer, AutoProcessor, CLIPSegForImageSegmentation, RawImage } from '@xenova/transformers';

// Load tokenizer, processor, and model
const tokenizer = await AutoTokenizer.from_pretrained('Xenova/clipseg-rd64-refined');
const processor = await AutoProcessor.from_pretrained('Xenova/clipseg-rd64-refined');
const model = await CLIPSegForImageSegmentation.from_pretrained('Xenova/clipseg-rd64-refined');

// Run tokenization
const texts = ['a glass', 'something to fill', 'wood', 'a jar'];
const text_inputs = tokenizer(texts, { padding: true, truncation: true });

// Read image and run processor
const image = await RawImage.read('https://github.com/timojl/clipseg/blob/master/example_image.jpg?raw=true');
const image_inputs = await processor(image);

// Run model with both text and pixel inputs
const { logits } = await model({ ...text_inputs, ...image_inputs });
// logits: Tensor {
//   dims: [4, 352, 352],
//   type: 'float32',
//   data: Float32Array(495616)[ ... ],
//   size: 495616
// }

You can visualize the predictions as follows:

const preds = logits
  .unsqueeze_(1)
  .sigmoid_()
  .mul_(255)
  .round_()
  .to('uint8');

for (let i = 0; i < preds.dims[0]; ++i) {
  const img = RawImage.fromTensor(preds[i]);
  img.save(`prediction_${i}.png`);
}

Original	`"a glass"`	`"something to fill"`	`"wood"`	`"a jar"`