-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generate trajectories (suggest additional ones in comments) #3
Comments
How will "negative" trajectories look like? It seems to me, that we can generally use positives from other tasks as negatives. So I would rather propose to make 6 diverse positives for each task. Positive descriptions also work well for "all-versus-all" evaluation, which I outlined in #4. |
It seems that by the two different setups we answer two slightly different questions: All-v-all: We assume all we care about are the different tasks we measure. Then we ask: Can the VLM distinguish those from each other? Pos+Neg examples: Assuming the neg examples are good (e.g. you almost see a window but not quite), we answer the question: Can the model recognize this task by itself, reliably? All in all is in some way easier for us, because thinking about what would be a good negative example (and trying to get a lot of them) is a futile task. However, pos+neg is easier in other ways — namely, it might be hard to have trajectories that can only have one label in this env (e.g. you're looking at a window but also inadvertently getting closer to a vase). Maybe I can try doing all-v-all, and if the task overlap is hard to get rid of I'll switch to pos+neg? |
Yes, this sounds reasonable. I agree that trajectories from other tasks will not necessarily be the hardest negatives, but the hope is that at least we'll have a lot of "medium-hard" negatives. We can think about specialized, good negative examples as an extension of benchmark -- the hard version (and therefore we should focus on them later). |
based on the clip spatial reasoning article from slack (https://medium.com/@hendrik.suvalov/evaluating-clip-for-spatial-reasoning-7ffcc8e00f82) it seems like it could be good to have the same task from the clearest possible camera angle and from a more oblique camera angle (to benchmark the extent to which a model has robust spatial reasoning) |
Prerequisite: #6.
My current plan is to generate a around 3 "positive" trajectories and 3 "negative" trajectories for all of the tasks below.My current plan is to generate 3 trajectories and 3 alternative descriptions per task.
Priorities:
Ideas:
The text was updated successfully, but these errors were encountered: