PyTorch implementation and pretrained models for FLSL. For details, see FLSL: Feature-Level Self-supervised Learning.
Please install PyTorch and download the ImageNet dataset. This codebase has been developed with python version 3.9, PyTorch version 1.12.0, CUDA 11.6 and torchvision 0.13.0. For a glimpse at the full documentation of DINO training please run:
python --help
Run FLSL with ViT-small network on a single node with 8 GPUs for 300 epochs with the following command. Training time is 3.5 day
To pretrain on ImageNet-1K, run:
torchrun --standalone --nproc_per_node=gpu \
./ \
--arch vit_small \
--patch_size 16 \
--out_dim 4096 \
--output_dir ./output/ \
--data_path /directory/to/imagenet-1k/train/ \
--local_crops_number 2 \
--local_crops_scale 0.05 0.4 \
--global_crops_scale_t 0.8 1.0 \
--global_crops_scale_s 0.5 1.0 \
--random_pooling_window 2 \
--norm_last_layer True \
--batch_size_per_gpu 64 \
--epochs 300 \
--warmup_teacher_temp_epochs 30 \
--warmup_teacher_temp 0.04 \
--teacher_temp 0.07 \
--teacher_centering False \
--local_crops_size 96
Change the vit_small to vit_base for FLSL with ViT-base model.
You can download the weights of the pretrained models on ImageNet.
Dataset | arch | checkpoint |
IN-1K | ViT-S/16 | download |
Step 1. Prepare COCO dataset
The dataset can be downloaded at
Step 2. Install mmdetection
git clone
Step 3. Fine-tune on the COCO dataset
tools/ configs/selfpatch/ [number of gpu]\
--work-dir /path/to/saving_dir\
--seed 0 --deterministic\
--options model.pretrained=/path/to/model_dir\
Step 1. Prepare ADE20K dataset
The dataset can be downloaded at
or following instruction of
Step 2. Install mmsegmentation
git clone
Step 3. Convert your model
python tools/model_converters/ /path/to/model_dir /path/to/saving_dir
Step 4. Fine-tune on the ADE20K dataset
tools/ configs/selfpatch/ [number of gpu]\
--work-dir /path/to/saving_dir\
--seed 0 --deterministic\
--options model.pretrained=/path/to/model_dir
The optimization hyperarameters are adopted from XCiT.
Step 1. Prepare DAVIS 2017 data
cd $HOME
git clone
cd davis-2017
Step 2. Run Video object segmentation
--data_path /path/to/davis-2017/DAVIS/\
--output_dir /path/to/saving_dir\ --pretrained_weights /path/to/model_dir\
--arch vit_small\
--patch_size 16
If you find this repository useful, please consider giving a star ⭐ and citation:
title={{FLSL}: Feature-level Self-supervised Learning},
author={Qing Su and Anton Netchaev and Hai Li and Shihao Ji},
booktitle={Thirty-seventh Conference on Neural Information Processing Systems},