AdamP low GPU usage #1211

hadarshavit · 2022-04-06T16:54:46Z

hadarshavit
Apr 6, 2022

Hi. When I'm training a model using fusedlamb the GPU usage is constantly over 90% (RTX 3090), however when I'm training the same model with AdamP the GPU usage is much lower (varies from 60-90%) and the training is 20-30% slower. Is there a way to increase the GPU usage for faster training?
Thanks

Answered by rwightman

Apr 6, 2022

@hadarshavit

this,

def projection(p, grad, perturb, delta: float, wd_ratio: float, eps: float):
    wd = 1.
    expand_size = (-1,) + (1,) * (len(p.shape) - 1)
    for view_func in [_channel_view, _layer_view]:
        param_view = view_func(p)
        grad_view = view_func(grad)
        cosine_sim = F.cosine_similarity(grad_view, param_view, dim=1, eps=eps).abs_()

        # FIXME this is a problem for PyTorch XLA
        if cosine_sim.max() < delta / math.sqrt(param_view.size(1)):
            p_n = p / param_view.norm(p=2, dim=1).add_(eps).reshape(expand_size)
            perturb -= p_n * view_func(p_n * perturb).sum(dim=1).reshape(expand_size)
            wd = wd_ratio
            retu…

View full answer

rwightman · 2022-04-06T17:15:25Z

rwightman
Apr 6, 2022
Maintainer

@hadarshavit

this,

def projection(p, grad, perturb, delta: float, wd_ratio: float, eps: float):
    wd = 1.
    expand_size = (-1,) + (1,) * (len(p.shape) - 1)
    for view_func in [_channel_view, _layer_view]:
        param_view = view_func(p)
        grad_view = view_func(grad)
        cosine_sim = F.cosine_similarity(grad_view, param_view, dim=1, eps=eps).abs_()

        # FIXME this is a problem for PyTorch XLA
        if cosine_sim.max() < delta / math.sqrt(param_view.size(1)):
            p_n = p / param_view.norm(p=2, dim=1).add_(eps).reshape(expand_size)
            perturb -= p_n * view_func(p_n * perturb).sum(dim=1).reshape(expand_size)
            wd = wd_ratio
            return perturb, wd

    return perturb, wd

https://github.com/rwightman/pytorch-image-models/blob/master/timm/optim/adamp.py#L25-L40

is not efficient and adds quite a bit of overhead, I haven't tried but maaaaybe @torch.jit.script decorator on that function would help?

1 reply

hadarshavit Apr 6, 2022
Author

Thanks for the response! It is a bit faster (a few precents higher usage) but still slower than fusedlamb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

AdamP low GPU usage #1211

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

AdamP low GPU usage #1211

Uh oh!

hadarshavit Apr 6, 2022

Replies: 1 comment · 1 reply

Uh oh!

rwightman Apr 6, 2022 Maintainer

Uh oh!

hadarshavit Apr 6, 2022 Author

hadarshavit
Apr 6, 2022

Replies: 1 comment 1 reply

rwightman
Apr 6, 2022
Maintainer

hadarshavit Apr 6, 2022
Author