Skip to content

Pyramid Vision Transformer Version 2 + ResNet18 #857

Answered by rwightman
khawar-islam asked this question in Q&A
Discussion options

You must be logged in to vote

@khawar512 this is more of question than bug or feat. A full resnet18 wouldn't work there due to the stride constraints of the stages, you could probably make it a bit more convolutional by changing the patch embed layers to be larger stacks of say 3-4 3x3 convs and keep the stride constrained, given how many other CNN - Transformer hybrids there are out there I wouldn't be suprised if that's already one of the variants I've seen...

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by khawar-islam
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
enhancement New feature or request
2 participants
Converted from issue

This discussion was converted from issue #856 on September 07, 2021 06:25.