ViViT for Regression Tasks #7

Taimoor-R · 2023-02-01T23:46:57Z

I have been trying to use TimeSformer and ViViT, I have managed to convert it into a regression model by changing the loss function and setting the output of the mlp to 1. However what i understand is that a video vision transformer takes a video clip as an input(broken into frames) and outputs a single value corresponding to that input clip. I would like the model to output a value for each frame of the clip input so instead of outputing 1 value it outputs 32 values. Can you guide me in this regards.

BitCalSaul · 2023-12-23T03:01:43Z

Hi @Taimoor-R I also have interest in developing a model that performs this function and am also in the process of figuring out how to adjust the model to predict values for each pixel (or said regression). Have you found a solution in this direction? Thanks for any hint.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ViViT for Regression Tasks #7

ViViT for Regression Tasks #7

Taimoor-R commented Feb 1, 2023

BitCalSaul commented Dec 23, 2023

ViViT for Regression Tasks #7

ViViT for Regression Tasks #7

Comments

Taimoor-R commented Feb 1, 2023

BitCalSaul commented Dec 23, 2023