Pyramid Vision Transformer Version 2 + ResNet18 #857
-
Dear @rwightman Thank you for your hard work. Would it be possible for you to used ResNet18 for initial feature extraction and then pass to Pyramid Vision Transformer Version 2. PVT is one of the strongest ViT for achieving accuracy. I have tried a lot but it creates a lot of problems because, in PVT, we have four stages. https://github.com/whai362/PVT/blob/v2/classification/pvt_v2.py |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
@khawar512 this is more of question than bug or feat. A full resnet18 wouldn't work there due to the stride constraints of the stages, you could probably make it a bit more convolutional by changing the patch embed layers to be larger stacks of say 3-4 3x3 convs and keep the stride constrained, given how many other CNN - Transformer hybrids there are out there I wouldn't be suprised if that's already one of the variants I've seen... |
Beta Was this translation helpful? Give feedback.
@khawar512 this is more of question than bug or feat. A full resnet18 wouldn't work there due to the stride constraints of the stages, you could probably make it a bit more convolutional by changing the patch embed layers to be larger stacks of say 3-4 3x3 convs and keep the stride constrained, given how many other CNN - Transformer hybrids there are out there I wouldn't be suprised if that's already one of the variants I've seen...