d-rin

pesser · pesser · commit d929c0e81968 · 2020-12-31T01:41:48.000+01:00
diff --git a/README.md b/README.md
@@ -24,6 +24,47 @@ conda activate taming
 
 ## Data Preparation
 
+### ImageNet
+The code will try to download (through [Academic
+Torrents](http://academictorrents.com/)) and prepare ImageNet the first time it
+is used. However, since ImageNet is quite large, this requires a lot of disk
+space and time. If you already have ImageNet on your disk, you can speed things
+up by putting the data into
+`${XDG_CACHE}/autoencoders/data/ILSVRC2012_{split}/data/` (which defaults to
+`~/.cache/autoencoders/data/ILSVRC2012_{split}/data/`), where `{split}` is one
+of `train`/`validation`. It should have the following structure:
+
+```
+${XDG_CACHE}/autoencoders/data/ILSVRC2012_{split}/data/
+├── n01440764
+│   ├── n01440764_10026.JPEG
+│   ├── n01440764_10027.JPEG
+│   ├── ...
+├── n01443537
+│   ├── n01443537_10007.JPEG
+│   ├── n01443537_10014.JPEG
+│   ├── ...
+├── ...
+```
+
+If you haven't extracted the data, you can also place
+`ILSVRC2012_img_train.tar`/`ILSVRC2012_img_val.tar` (or symlinks to them) into
+`${XDG_CACHE}/autoencoders/data/ILSVRC2012_train/` /
+`${XDG_CACHE}/autoencoders/data/ILSVRC2012_validation/`, which will then be
+extracted into above structure without downloading it again.  Note that this
+will only happen if neither a folder
+`${XDG_CACHE}/autoencoders/data/ILSVRC2012_{split}/data/` nor a file
+`${XDG_CACHE}/autoencoders/data/ILSVRC2012_{split}/.ready` exist. Remove them
+if you want to force running the dataset preparation again.
+
+You will then need to prepare the depth data using
+[MiDaS](https://github.com/intel-isl/MiDaS). Create a symlink
+`data/imagenet_depth` pointing to a folder with two subfolders `train` and
+`val`, each mirroring the structure of the corresponding ImageNet folder
+described above and containing a `png` file for each of ImageNet's `JPEG`
+files. The `png` encodes `float32` depth values obtained from MiDaS as RGBA
+images. We provide the script `TODO` to generate this data.
+
 ### CelebA-HQ
 Create a symlink `data/celebahq` pointing to a folder containing the `.npy`
 files of CelebA-HQ (instructions to obtain them can be found in the [PGGAN
@@ -47,6 +88,12 @@ Download [2020-11-09T13-31-51_sflckr](TODO) and place it into `logs`. Run
 streamlit run scripts/sample_conditional.py -- -r logs/2020-11-13T21-41-45_faceshq_transformer/
 ```
 
+### D-RIN
+Download [2020-11-20T12-54-32_drin_transformer](TODO) and place it into `logs`. Run
+```
+streamlit run scripts/sample_conditional.py -- -r logs/2020-11-20T12-54-32_drin_transformer/
+```
+
 ## Training models
 
 ### FacesHQ
@@ -65,6 +112,33 @@ corresponds to the preconfigured checkpoint path), then run
 python main.py --base configs/faceshq_transformer.yaml -t True --gpus 0,
 ```
 
+### D-RIN
+
+Train a VQGAN on ImageNet with
+```
+python main.py --base configs/imagenet_vqgan.yaml -t True --gpus 0,
+```
+
+or download a pretrained one from [2020-09-23T17-56-33_imagenet_vqgan](TODO)
+and place under `logs`. If you trained your own, adjust the path in the config
+key `model.params.first_stage_config.params.ckpt_path` of
+`configs/drin_transformer.yaml`.
+
+Train a VQGAN on Depth Maps of ImageNet with
+```
+python main.py --base configs/imagenetdepth_vqgan.yaml -t True --gpus 0,
+```
+
+or download a pretrained one from [2020-11-03T15-34-24_imagenetdepth_vqgan](TODO)
+and place under `logs`. If you trained your own, adjust the path in the config
+key `model.params.cond_stage_config.params.ckpt_path` of
+`configs/drin_transformer.yaml`.
+
+To train the transformer, run
+```
+python main.py --base configs/drin_transformer.yaml -t True --gpus 0,
+```
+
 ## Shout-outs
 Thanks to everyone who makes their code and models available. In particular,
 
diff --git a/configs/drin_transformer.yaml b/configs/drin_transformer.yaml
@@ -0,0 +1,77 @@
+model:
+  base_learning_rate: 4.5e-06
+  target: taming.models.cond_transformer.Net2NetTransformer
+  params:
+    cond_stage_key: depth
+    transformer_config:
+      target: taming.modules.transformer.mingpt.GPT
+      params:
+        vocab_size: 1024
+        block_size: 512
+        n_layer: 24
+        n_head: 16
+        n_embd: 1024
+    first_stage_config:
+      target: taming.models.vqgan.VQModel
+      params:
+        ckpt_path: logs/2020-09-23T17-56-33_imagenet_vqgan/checkpoints/last.ckpt
+        embed_dim: 256
+        n_embed: 1024
+        ddconfig:
+          double_z: false
+          z_channels: 256
+          resolution: 256
+          in_channels: 3
+          out_ch: 3
+          ch: 128
+          ch_mult:
+          - 1
+          - 1
+          - 2
+          - 2
+          - 4
+          num_res_blocks: 2
+          attn_resolutions:
+          - 16
+          dropout: 0.0
+        lossconfig:
+          target: taming.modules.losses.DummyLoss
+    cond_stage_config:
+      target: taming.models.vqgan.VQModel
+      params:
+        ckpt_path: logs/2020-11-03T15-34-24_imagenetdepth_vqgan/checkpoints/last.ckpt
+        embed_dim: 256
+        n_embed: 1024
+        ddconfig:
+          double_z: false
+          z_channels: 256
+          resolution: 256
+          in_channels: 1
+          out_ch: 1
+          ch: 128
+          ch_mult:
+          - 1
+          - 1
+          - 2
+          - 2
+          - 4
+          num_res_blocks: 2
+          attn_resolutions:
+          - 16
+          dropout: 0.0
+        lossconfig:
+          target: taming.modules.losses.DummyLoss
+
+data:
+  target: main.DataModuleFromConfig
+  params:
+    batch_size: 2
+    num_workers: 8
+    train:
+      target: taming.data.imagenet.RINTrainWithDepth
+      params:
+        size: 256
+    validation:
+      target: taming.data.imagenet.RINValidationWithDepth
+      params:
+        size: 256
diff --git a/configs/imagenet_vqgan.yaml b/configs/imagenet_vqgan.yaml
@@ -0,0 +1,42 @@
+model:
+  base_learning_rate: 4.5e-6
+  target: taming.models.vqgan.VQModel
+  params:
+    embed_dim: 256
+    n_embed: 1024
+    ddconfig:
+      double_z: False
+      z_channels: 256
+      resolution: 256
+      in_channels: 3
+      out_ch: 3
+      ch: 128
+      ch_mult: [ 1,1,2,2,4]  # num_down = len(ch_mult)-1
+      num_res_blocks: 2
+      attn_resolutions: [16]
+      dropout: 0.0
+
+    lossconfig:
+      target: taming.modules.losses.vqperceptual.VQLPIPSWithDiscriminator
+      params:
+        disc_conditional: False
+        disc_in_channels: 3
+        disc_start: 250001
+        disc_weight: 0.8
+        codebook_weight: 1.0
+
+data:
+  target: main.DataModuleFromConfig
+  params:
+    batch_size: 3
+    num_workers: 8
+    train:
+      target: taming.data.imagenet.ImageNetTrain
+      params:
+        config:
+          size: 256
+    validation:
+      target: taming.data.imagenet.ImageNetValidation
+      params:
+        config:
+          size: 256
diff --git a/configs/imagenetdepth_vqgan.yaml b/configs/imagenetdepth_vqgan.yaml
@@ -0,0 +1,41 @@
+model:
+  base_learning_rate: 4.5e-6
+  target: taming.models.vqgan.VQModel
+  params:
+    embed_dim: 256
+    n_embed: 1024
+    image_key: depth
+    ddconfig:
+      double_z: False
+      z_channels: 256
+      resolution: 256
+      in_channels: 1
+      out_ch: 1
+      ch: 128
+      ch_mult: [ 1,1,2,2,4]  # num_down = len(ch_mult)-1
+      num_res_blocks: 2
+      attn_resolutions: [16]
+      dropout: 0.0
+
+    lossconfig:
+      target: taming.modules.losses.vqperceptual.VQLPIPSWithDiscriminator
+      params:
+        disc_conditional: False
+        disc_in_channels: 1
+        disc_start: 50001
+        disc_weight: 0.75
+        codebook_weight: 1.0
+
+data:
+  target: main.DataModuleFromConfig
+  params:
+    batch_size: 3
+    num_workers: 8
+    train:
+      target: taming.data.imagenet.ImageNetTrainWithDepth
+      params:
+        size: 256
+    validation:
+      target: taming.data.imagenet.ImageNetValidationWithDepth
+      params:
+        size: 256
diff --git a/taming/data/base.py b/taming/data/base.py
@@ -21,11 +21,11 @@ def __getitem__(self, idx):
 
 
 class ImagePaths(Dataset):
-    def __init__(self, paths, size=None, random_crop=False):
+    def __init__(self, paths, size=None, random_crop=False, labels=None):
         self.size = size
         self.random_crop = random_crop
 
-        self.labels = dict()
+        self.labels = dict() if labels is None else labels
         self.labels["file_path_"] = paths
         self._length = len(paths)
 
diff --git a/taming/data/imagenet.py b/taming/data/imagenet.py
diff --git a/taming/data/utils.py b/taming/data/utils.py