Skip to content

ModelCheckpoint options save_last='link' and save_top_k=-1 cause recursive symlink creation #21110

@Hippskill

Description

@Hippskill

Bug description

for the following snippet, a last.ckpt -> last.ckpt symlink is generated

What version are you seeing the problem on?

v2.5

Reproduced in studio

No response

How to reproduce the bug

import uuid

import torch
from pytorch_lightning import Trainer, LightningModule
from pytorch_lightning.callbacks import ModelCheckpoint
from torch.utils.data import DataLoader, Dataset


class RandomDataset(Dataset):
    def __init__(self, size, length):
        self.len = length
        self.data = torch.randn(length, size)

    def __getitem__(self, index):
        return self.data[index]

    def __len__(self):
        return self.len


class BoringModel(LightningModule):
    def __init__(self):
        super().__init__()
        self.layer = torch.nn.Linear(32, 2)

    def forward(self, x):
        return self.layer(x)

    def training_step(self, batch, batch_idx):
        loss = self(batch).sum()
        self.log("train_loss", loss)
        return {"loss": loss}

    def validation_step(self, batch, batch_idx):
        loss = self(batch).sum()
        self.log("valid_loss", loss)

    def test_step(self, batch, batch_idx):
        loss = self(batch).sum()
        self.log("test_loss", loss)

    def configure_optimizers(self):
        return torch.optim.SGD(self.layer.parameters(), lr=0.1)


if __name__ == "__main__":
    tmpdir = f"/tmp/{uuid.uuid4()}"
    print(tmpdir)

    trainer = Trainer(
        default_root_dir=tmpdir,
        max_epochs=2,
        callbacks=[ModelCheckpoint(dirpath=tmpdir, every_n_epochs=10, save_last='link', save_top_k=-1)],
        enable_checkpointing=True,
    )
    model = BoringModel()
    trainer.fit(model, train_dataloaders=DataLoader(RandomDataset(32, 64), batch_size=2))

Error messages and logs

ls -lh /tmp/0087f725-e4fa-42bc-a60d-51dfbcd57b41                                                                                                                                          
9 Aug 22 13:53 last.ckpt -> last.ckpt

Environment

pytorch-lightning==2.5.2
torch==2.7.1

More info

maybe related to the code that was introduced in #12391

cc @lantiga

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions