Loss Scale for Training Siglip #115

lezhang7 · 2024-06-24T15:47:55Z

Hi,

Thanks for your great work. I was trying to apply siglip loss for training contrastive models. However, I find the loss scale is quiet small, usually around 0.003 at the begging. I wonder if any thing goes wrong in my implementation.

    n = logits.size(0)
    labels = 2 * torch.eye(n) - torch.ones(n, n)  # -1 with diagonal 1
    labels = labels.to(logits.device)
    loss = -torch.mean(F.logsigmoid(labels * logits)) / n    
    return loss```

The text was updated successfully, but these errors were encountered:

udion · 2024-09-02T03:20:52Z

Did you normalize your feature vectors before creating the logits? the feature vectors should be unit vectors (normalize your vectors with L2 norm of the vector)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loss Scale for Training Siglip #115

Loss Scale for Training Siglip #115

lezhang7 commented Jun 24, 2024

udion commented Sep 2, 2024

Loss Scale for Training Siglip #115

Loss Scale for Training Siglip #115

Comments

lezhang7 commented Jun 24, 2024

udion commented Sep 2, 2024