Pyramid Vision Transformer Version 2 + ResNet18 #857

khawar-islam · 2021-09-07T05:04:01Z

khawar-islam
Sep 7, 2021

Thank you for your hard work. Would it be possible for you to used ResNet18 for initial feature extraction and then pass to Pyramid Vision Transformer Version 2. PVT is one of the strongest ViT for achieving accuracy. I have tried a lot but it creates a lot of problems because, in PVT, we have four stages.

https://github.com/whai362/PVT/blob/v2/classification/pvt_v2.py

Answered by rwightman

Sep 7, 2021

@khawar512 this is more of question than bug or feat. A full resnet18 wouldn't work there due to the stride constraints of the stages, you could probably make it a bit more convolutional by changing the patch embed layers to be larger stacks of say 3-4 3x3 convs and keep the stride constrained, given how many other CNN - Transformer hybrids there are out there I wouldn't be suprised if that's already one of the variants I've seen...

View full answer

rwightman · 2021-09-07T06:28:36Z

rwightman
Sep 7, 2021
Maintainer

@khawar512 this is more of question than bug or feat. A full resnet18 wouldn't work there due to the stride constraints of the stages, you could probably make it a bit more convolutional by changing the patch embed layers to be larger stacks of say 3-4 3x3 convs and keep the stride constrained, given how many other CNN - Transformer hybrids there are out there I wouldn't be suprised if that's already one of the variants I've seen...

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Pyramid Vision Transformer Version 2 + ResNet18 #857

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Pyramid Vision Transformer Version 2 + ResNet18 #857

Uh oh!

khawar-islam Sep 7, 2021

Replies: 1 comment

Uh oh!

rwightman Sep 7, 2021 Maintainer

khawar-islam
Sep 7, 2021

rwightman
Sep 7, 2021
Maintainer