[Paper
]
This code is based on DiffusionCLIP.
This work addresses efficiency of the recent text-driven editing methods based on unconditional diffusion models and provides the algorithm that learns image manipulations 4.5−10× faster and applies them 8× faster than DiffusionCLIP.
We provide the following two settings for image manipulation:
1. Prelearned image manipulations
The pretrained diffusion model is adapted to the given textual transform on 50 images. Then, you can apply the learned transform to your images. The entire procedure takes about ~45 secs on NVIDIA A100.
2. Single-image editing
The pretrained diffusion model is adapted to your text description and image on the fly. This setting takes about ~4 secs on NVIDIA A100.
This work uses unconditional diffusion models pretrained on the CelebA-HQ-256, LSUN-Church-256, AFHQ-Dog-256 and ImageNet-512 datasets.
This notebook provides a tool for single-image editing using our approach. You are welcome to edit your images according to any textual transform. Please, pay close attention to the hyperparameter values.
- Install required dependencies
# Clone the repo
!git clone https://github.com/quickjkee/eff-diff-edit
# Install dependencies
!pip install ftfy regex tqdm
!pip install lmdb
!pip install pynvml
!pip install git+https://github.com/openai/CLIP.git
conda install --yes -c pytorch pytorch=1.7.1 torchvision cudatoolkit=<CUDA_VERSION>
-
Download pretrained diffusion models
-
Download datasets (this part can be skipped if you have your own training set, please see the second section for details)
- For CelebA-HQ and AFHQ-Dog you can use the following code:
# CelebA-HQ 256x256 bash data_download.sh celeba_hq . # AFHQ-Dog 256x256 bash data_download.sh afhq .
- For LSUN-Church and ImageNet, you can download them from the original sources and put them into
./data/lsun
or./data/imagenet
.
- Select the config for the particular dataset:
celeba.yaml / afhq.yaml / church.yaml / imagenet.yaml
. - Select the desired manipulation from the list. The list of available textual transforms for each dataset is here.
Note that you can also add your own transforms to this file. - Check out the descriptions for the available options here.
Below we provide the commands for different settings:
-
Prelearned image manipulations (dataset training and dataset test)
This command adapts the pretrained model using images from the training set and applies the learned transform to the test images. The following command uses 50 CelebA-HQ images for training and evaluation:python main.py --clip_finetune \ --config celeba.yml \ --exp ./runs/test \ --edit_attr makeup \ --fast_noising_train 1 \ --fast_noising_test 1 \ --own_test 0 \ --own_training 0 \ --single_image 0 \ --align_face 0 \ --n_train_img 50 \ --n_precomp_img 50 \ --n_test_img 50 \ --n_iter 5 \ --t_0 350 \ --n_inv_step 40 \ --n_train_step 6 \ --n_test_step 6 \ --lr_clip_finetune 6e-6 \ --id_loss_w 0.0 \ --clip_loss_w 3 \ --l1_loss_w 1.0
-
Prelearned image manipulations (dataset training and own test)
This command adapts the pretrained model using images from the training set and applies the learned transform to your own images. Basically, one needs to change--own_test 0
to--own_test all
. Before running, put your images into the./imgs_for_test
folder. Moreover, you can evaluate the learned transform on a single image: change--own_test all
to--own_test <your_image_name>
.
python main.py --clip_finetune \
--config celeba.yml \
--exp ./runs/test \
--edit_attr makeup \
--fast_noising_train 1 \
--fast_noising_test 1 \
--own_test all \
--own_training 0 \
--single_image 0 \
--align_face 0 \
--n_train_img 50 \
--n_precomp_img 50 \
--n_test_img 50 \
--n_iter 5 \
--t_0 350 \
--n_inv_step 40 \
--n_train_step 6 \
--n_test_step 6 \
--lr_clip_finetune 6e-6 \
--id_loss_w 0.0 \
--clip_loss_w 3 \
--l1_loss_w 1.0
-
Prelearned image manipulations (own training and own test)
If you want to adapt a diffusion model on your own dataset then simply put them into the./imgs_for_train
folder and change--own_training 0
to--own_training 1
. In this case, you do not need to download any datasets above.python main.py --clip_finetune \ --config celeba.yml \ --exp ./runs/test \ --edit_attr makeup \ --fast_noising_train 1 \ --fast_noising_test 1 \ --own_test all \ --own_training 1 \ --single_image 0 \ --align_face 0 \ --n_train_img 50 \ --n_precomp_img 50 \ --n_test_img 50 \ --n_iter 5 \ --t_0 350 \ --n_inv_step 40 \ --n_train_step 6 \ --n_test_step 6 \ --lr_clip_finetune 6e-6 \ --id_loss_w 0.0 \ --clip_loss_w 3 \ --l1_loss_w 1.0
-
Single-image editing (own image)
To transform your own image in single image editing, change--single_image 0
to--single_image 1
. Then, put the image into./imgs_for_test
and set up--own_test <your_image_name>
. For instance,--own_test girl.png
as in the following example:python main.py --clip_finetune \ --config celeba.yml \ --exp ./runs/test \ --edit_attr makeup \ --fast_noising_train 1 \ --fast_noising_test 1 \ --own_test girl.png \ --own_training 1 \ --single_image 1 \ --align_face 1 \ --n_train_img 1 \ --n_precomp_img 1 \ --n_test_img 1 \ --n_iter 5 \ --t_0 350 \ --n_inv_step 40 \ --n_train_step 6 \ --n_test_step 6 \ --lr_clip_finetune 6e-6 \ --id_loss_w 0.0 \ --clip_loss_w 3 \ --l1_loss_w 1.0