Skip to content

NaN in DPMSolverMultistepInverseScheduler #10748

Open
@DekuLiuTesla

Description

@DekuLiuTesla

Hi, everyone, I'm new to diffusers. I'm trying to use DPMSolverMultistepInverseScheduler for DDIM inversion. The applied config is:

dpmpp_2m_sde_karras_scheduler_inv = DPMSolverMultistepInverseScheduler(
    num_train_timesteps=1000,
    beta_start=0.00085,
    beta_end=0.012,
    algorithm_type="sde-dpmsolver++",
    use_karras_sigmas=True,
    steps_offset=1
)

And the DDIM inversion is realized through:

self.scheduler.set_timesteps(self.inv_config.steps)
timesteps = self.scheduler.timesteps
with torch.autocast(device_type=self.device, dtype=self.dtype):
    for i, t in enumerate(tqdm(timesteps)):
        noises = []
        x_index = torch.arange(len(x))
        batches = x_index.split(self.batch_size, dim = 0)
        for batch in batches:
            noise = self.pred_noise(
                x[batch], conds, timesteps[i], concat_conds=x[batch], batch_idx=batch)
            noises += [noise]
        noises = torch.cat(noises)
        
        x = self.scheduler.step(noises, t, x, generator=self.rng, return_dict=False)[0]

But NaN occurs in the first scheduler step. I dug into it and found it happens in dpm_solver_first_order_update:

sigma_t, sigma_s = self.sigmas[self.step_index + 1], self.sigmas[self.step_index]
alpha_t, sigma_t = self._sigma_to_alpha_sigma_t(sigma_t)
alpha_s, sigma_s = self._sigma_to_alpha_sigma_t(sigma_s)
lambda_t = torch.log(alpha_t) - torch.log(sigma_t)
lambda_s = torch.log(alpha_s) - torch.log(sigma_s)
h = lambda_t - lambda_s

The self.sigmas is tensor([ 0.0292, 0.0462, 0.0710, 0.1065, 0.1563, 0.2249, 0.3178, 0.4417, 0.6050, 0.8176, 1.0911, 1.4396, 1.8795, 2.4300, 3.1132, 3.9548, 4.9844, 6.2356, 7.7471, 9.5622, 11.7303, 14.3068, 17.3539, 20.9411, 25.1461, 25.1461]). Its increasing order leads lambda_t to be smaller than lambda_s and therefore a negative h.
As a result, in
elif self.config.algorithm_type == "sde-dpmsolver++":
assert noise is not None
x_t = (
(sigma_t / sigma_s * torch.exp(-h)) * sample
+ (alpha_t * (1 - torch.exp(-2.0 * h))) * model_output
+ sigma_t * torch.sqrt(1.0 - torch.exp(-2 * h)) * noise
)

torch.sqrt(1.0 - torch.exp(-2 * h)) becomes NaN. But I noticed that in DPMSolverMultistepScheduler, the problem is avoided by applying flip:
if self.config.use_karras_sigmas:
sigmas = np.flip(sigmas).copy()

I've searched for many usage examples, but I still can't figure out the stem of the problem. Can anybody give a help?🙏

Metadata

Metadata

Assignees

No one assigned

    Labels

    staleIssues that haven't received updates

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions