diff --git a/.idea/workspace.xml b/.idea/workspace.xml index cebb463..5926fb3 100644 --- a/.idea/workspace.xml +++ b/.idea/workspace.xml @@ -1,7 +1,10 @@ - + + + + - { + "keyToString": { + "RunOnceActivity.OpenProjectViewOnStart": "true", + "RunOnceActivity.ShowReadmeOnStart": "true", + "SHARE_PROJECT_CONFIGURATION_FILES": "true", + "WebServerToolWindowFactoryState": "true", + "last_opened_file_path": "/Users/jiexiao/sourcecode/IDT", + "nodejs_package_manager_path": "npm" } -}]]> +} @@ -45,7 +48,12 @@ 1657510063115 - + + + + + + diff --git a/README.md b/README.md index 7b2b2c6..262c544 100644 --- a/README.md +++ b/README.md @@ -1,14 +1,14 @@ -# [Image Deraining Transformer](https://ieeexplore.ieee.org/document/9798773) (IDT) +# [Image Deraining Transformer](https://ieeexplore.ieee.org/document/9798773) + Jie Xiao, Xueyang Fu, Aiping Liu, Feng Wu, Zheng-Jun Zha
### Update: * **2022.09.21**: Add test code for full size images: test_full_size.py - - -> **Abstract:** *Existing deep learning based de-raining approaches have resorted to the convolutional architectures. However, the intrinsic limitations of convolution, including local receptive fields and independence of input content, hinder the model's ability to capture long-range and complicated rainy artifacts. To overcome these limitations, we propose an effective and efficient transformer-based architecture for the image de-raining. Firstly, we introduce general priors of vision tasks, i.e., locality and hierarchy, into the network architecture so that our model can achieve excellent de-raining performance without costly pre-training. Secondly, since the geometric appearance of rainy artifacts is complicated and of significant variance in space, it is essential for de-raining models to extract both local and non-local features. Therefore, we design the complementary window-based transformer and spatial transformer to enhance locality while capturing long-range dependencies. Besides, to compensate for the positional blindness of self-attention, we establish a separate representative space for modeling positional relationship, and design a new relative position enhanced multi-head self-attention. In this way, our model enjoys powerful abilities to capture dependencies from both content and position, so as to achieve better image content recovery while removing rainy artifacts. Experiments substantiate that our approach attains more appealing results than state-of-the-art methods quantitatively and qualitatively.*
+> **Abstract:** *Existing deep learning based de-raining approaches have resorted to the convolutional architectures. However, the intrinsic limitations of convolution, including local receptive fields and independence of input content, hinder the model's ability to capture long-range and complicated rainy artifacts. To overcome these limitations, we propose an effective and efficient transformer-based architecture for the image de-raining. Firstly, we introduce general priors of vision tasks, i.e., locality and hierarchy, into the network architecture so that our model can achieve excellent de-raining performance without costly pre-training. Secondly, since the geometric appearance of rainy artifacts is complicated and of significant variance in space, it is essential for de-raining models to extract both local and non-local features. Therefore, we design the complementary window-based transformer and spatial transformer to enhance locality while capturing long-range dependencies. Besides, to compensate for the positional blindness of self-attention, we establish a separate representative space for modeling positional relationship, and design a new relative position enhanced multi-head self-attention. In this way, our model enjoys powerful abilities to capture dependencies from both content and position, so as to achieve better image content recovery while removing rainy artifacts. Experiments substantiate that our approach attains more appealing results than state-of-the-art methods quantitatively and qualitatively.* + ## Method ![IDT](fig/architecture.png) @@ -29,6 +29,12 @@ - AGAN-Data: [Link](https://github.com/rui1996/DeRaindrop) +## Merging of Inferenced Patches +To evaluate the image with arbitrary size, we first split the image to overlapped $128\times 128$ patches, and merge evaluated patches back to original resolution. +Compared with directly averaging the overlapped zone, our testing precedure helps to mitigate block artifacts. Please see test_full_size.py for implementation. +![Merge](fig/merge.png) + + ## Demo To test the pre-trained IDT model on full size images: diff --git a/fig/merge.png b/fig/merge.png new file mode 100644 index 0000000..865caba Binary files /dev/null and b/fig/merge.png differ