Mixed Precision Training #113

zsxzs · 2022-05-30T05:17:13Z

Hi! Thank you for your wonderful work and great code!
Have you tried mixed precision training? I use mixed precision when I train to deblurring model on gopro datasets. But the lost value will become NAN after dozens of epochs.
Could you help me find out what the problem is?

zsxzs · 2022-05-30T05:20:00Z

This is modified code, unfortunately batchsize is 4 (There's not enough memory)

if opt.TRAINING.FP16:
    from torch.cuda.amp import GradScaler
    scaler = GradScaler()
else:
    scaler = None

for epoch in range(start_epoch, opt.OPTIM.NUM_EPOCHS + 1):
    epoch_start_time = time.time()
    total_loss = 0
    val_loss = 0
    train_id = 1
    model_restoration.train()
    for i, data in enumerate(tqdm(train_loader), 0):

        for param in model_restoration.parameters():
            param.grad = None

        with torch.no_grad():
            target = data[0].cuda()
            input_ = data[1].cuda()

        # scaler is not None
        if opt.TRAINING.FP16:
            from torch.cuda.amp import autocast

            with autocast():
                restored = model_restoration(input_)

                # Compute loss at each stage
                loss_char = np.sum([criterion_char(restored[j], target) for j in range(len(restored))])
                loss_edge = np.sum([criterion_edge(restored[j], target) for j in range(len(restored))])
                loss = (loss_char) + (0.05 * loss_edge)

            scaler.scale(loss).backward()
            scaler.step(optimizer)
            scaler.update()

        else:
            restored = model_restoration(input_)

            # Compute loss at each stage
            loss_char = np.sum([criterion_char(restored[j], target) for j in range(len(restored))])
            loss_edge = np.sum([criterion_edge(restored[j], target) for j in range(len(restored))])
            loss = (loss_char) + (0.05 * loss_edge)

            loss.backward()
            optimizer.step()

        total_loss += loss.item()

drifterss · 2023-03-27T08:15:56Z

Hi! Thank you for your wonderful work and great code!
Does the FP16 super parameter set a Boolean value and set it to true? I temporarily do not see this super parameter in the training. yml file.Thank you for your reply

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mixed Precision Training #113

Mixed Precision Training #113

zsxzs commented May 30, 2022 •

edited

Loading

zsxzs commented May 30, 2022 •

edited

Loading

drifterss commented Mar 27, 2023

Mixed Precision Training #113

Mixed Precision Training #113

Comments

zsxzs commented May 30, 2022 • edited Loading

zsxzs commented May 30, 2022 • edited Loading

drifterss commented Mar 27, 2023

zsxzs commented May 30, 2022 •

edited

Loading

zsxzs commented May 30, 2022 •

edited

Loading