Unintuitive Logic in masked_fill Function of DistilBERT Model Implementation



`masked_fill` function of distilbert model implementation has currently unintuitive logic

https://github.com/huggingface/candle/blob/efd0e6822f4d0e2433f0ae02ba16f16cda834d97/candle-transformers/src/models/distilbert.rs#L13-L18

In the current setup, the user must invert the attention mask obtained from the tokenizer before passing it to the `model.forward` function. This requirement can be confusing as it differs from transformers implementation.

```rust
...
let text: Vec<&str>  = vec![...];
let encoded = tokenizer.encode_batch(text.to_vec().clone(), true)?;
let input_ids = encoded.iter().map(|v| v.get_ids().to_vec()).collect::<Vec<_>>();
let input_ids = Tensor::new(input_ids, &device)?;
let attention_mask = encoded.iter().map(|encoding| encoding.get_attention_mask().to_vec()).collect::<Vec<_>>();
let attention_mask = Tensor::new(attention_mask, &device)?;

let (batch_size, feature_size) = input_ids.dims2()?;

// Invert the attention mask for correct behavior --> Counterintuitive
let attention_mask = attention_mask.eq(0 as u32)?.reshape((batch_size, 1, 1, feature_size))?;

let output = model.forward(&input_ids, &attention_mask)?;
...
```

**Proposition**:

Replace `masked_fill` function with:

```rust
fn masked_fill(on_true: &Tensor, mask: &Tensor, on_false: f32) -> Result<Tensor> {
    let shape = mask.shape();
    let on_false = Tensor::new(on_false, on_true.device())?.broadcast_as(shape.dims())?;
    let m = mask.where_cond(&on_true, &on_false)?;
    Ok(m)
}
```





	fn masked_fill(on_false: &Tensor, mask: &Tensor, on_true: f32) -> Result<Tensor> {
	let shape = mask.shape();
	let on_true = Tensor::new(on_true, on_false.device())?.broadcast_as(shape.dims())?;
	let m = mask.where_cond(&on_true, on_false)?;
	Ok(m)
	}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unintuitive Logic in masked_fill Function of DistilBERT Model Implementation #2721

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Unintuitive Logic in masked_fill Function of DistilBERT Model Implementation #2721

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions