Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ViViT for Regression Tasks #7

Open
Taimoor-R opened this issue Feb 1, 2023 · 1 comment
Open

ViViT for Regression Tasks #7

Taimoor-R opened this issue Feb 1, 2023 · 1 comment

Comments

@Taimoor-R
Copy link

I have been trying to use TimeSformer and ViViT, I have managed to convert it into a regression model by changing the loss function and setting the output of the mlp to 1. However what i understand is that a video vision transformer takes a video clip as an input(broken into frames) and outputs a single value corresponding to that input clip. I would like the model to output a value for each frame of the clip input so instead of outputing 1 value it outputs 32 values. Can you guide me in this regards.

@BitCalSaul
Copy link

Hi @Taimoor-R I also have interest in developing a model that performs this function and am also in the process of figuring out how to adjust the model to predict values for each pixel (or said regression). Have you found a solution in this direction? Thanks for any hint.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants