Replies: 4 comments
-
@bpfliegel there is no pre-set feature for that... You can use the feature extraction functionality in a general purpose way if you determine the other points of interest for extraction you can use the module names to get features anywhere. Injecting inputs back into the network is something else enterirely and would be a significant complexity to support across models. Probably the only way to try and do this without modifying significant parts of a model, would be to use FX to rewrite the model based on the points where you want to inject/fuse and extract |
Beta Was this translation helpful? Give feedback.
-
Moving to discussions as this is not a bug, and outside scope of a feature request at this time... |
Beta Was this translation helpful? Give feedback.
-
Thanks for your kind answer @rwightman, really appreciate it. I will check on torch FX to see if I can do that! Thanks so much again, Balint |
Beta Was this translation helpful? Give feedback.
-
Torch.FX has limitations, and some networks have their own unique madness - but this works most of time if someone needs it:
Quick and dirty solution, sorry for that. Thanks for the idea @rwightman on torch.fx! |
Beta Was this translation helpful? Give feedback.
-
I am generally fine (and quite happy) about feature extraction of models as per: https://rwightman.github.io/pytorch-image-models/feature_extraction/
However, I can't figure out how to solve something generally given any backbone used:
Think of a multimodal semantic segmentation task, assuming we have RGB and D as inputs. Both RGB and D is sent through their respective encoder branches (one backbone for each), we fuse features at stride 2,4,8,etc. and apply some decoder. This approach is very easy to do.
Okay. now imagine that we have a third encoder branch, which does not have any input, but gets the fusion product from the RGB and D branches at stride 2, and having that as its stride 2 input, invokes all modules to arrive at stride 4. At stride 4 it fuses its features with the stride 4 features of RGB and D branches and arrives at stride 8, etc. etc.
As I am aware TIMM generally assumes that we have an input at the start of the network and we parametrize TIMM to give us the outputs at specific strides. I am afraid there is no such feature to say, that 'hey, this is the input for stride 2, give me output at stride 4'. Or is there a way, but I failed to find that? I would be really useful to support fusion architectures.
I would welcome any ideas on how to do that, if it's possible.
Thanks a lot,
Balint
Beta Was this translation helpful? Give feedback.
All reactions