Generate trajectories (suggest additional ones in comments) #3

Eugleo · 2024-03-04T21:16:55Z

Prerequisite: #6.

~~My current plan is to generate a around 3 "positive" trajectories and 3 "negative" trajectories for all of the tasks below.~~

My current plan is to generate 3 trajectories and 3 alternative descriptions per task.

Priorities:

Detecting the type of room the agent is in
Detecting presence of an object in the frame (window?)
Videos from modeled scenes instead of the photorealistic ones
Walking up and down stairs (video understanding)
Walking through objects (video understanding)
Throwing something on the ground vs it already being on the ground (video understanding)
Object recognition for out-of-distribution objects (e.g. random objects in a room)
Spatial reasoning
Temporal reasoning (this with the one above could be one set of test trajectories)

Ideas:

Different, more out-of-distribution angles
Detecting proximity to an object (ideally in a scene where we also have GT for this)
Dropping something the agent held on the ground.
Toppling something over.
Pushing something from the top of the table to make it fall.
Natural-looking movement (e.g. walking straight vs strafing to one side to get somewhere)

Dont-Care-Didnt-Ask · 2024-03-05T18:35:51Z

How will "negative" trajectories look like?

It seems to me, that we can generally use positives from other tasks as negatives. So I would rather propose to make 6 diverse positives for each task. Positive descriptions also work well for "all-versus-all" evaluation, which I outlined in #4.

Eugleo · 2024-03-05T18:52:18Z

It seems that by the two different setups we answer two slightly different questions:

All-v-all: We assume all we care about are the different tasks we measure. Then we ask: Can the VLM distinguish those from each other?

Pos+Neg examples: Assuming the neg examples are good (e.g. you almost see a window but not quite), we answer the question: Can the model recognize this task by itself, reliably?

All in all is in some way easier for us, because thinking about what would be a good negative example (and trying to get a lot of them) is a futile task.

However, pos+neg is easier in other ways — namely, it might be hard to have trajectories that can only have one label in this env (e.g. you're looking at a window but also inadvertently getting closer to a vase).

Maybe I can try doing all-v-all, and if the task overlap is hard to get rid of I'll switch to pos+neg?

Dont-Care-Didnt-Ask · 2024-03-05T18:58:54Z

Yes, this sounds reasonable. I agree that trajectories from other tasks will not necessarily be the hardest negatives, but the hope is that at least we'll have a lot of "medium-hard" negatives.

We can think about specialized, good negative examples as an extension of benchmark -- the hard version (and therefore we should focus on them later).

evgunter · 2024-03-06T20:00:52Z

based on the clip spatial reasoning article from slack (https://medium.com/@hendrik.suvalov/evaluating-clip-for-spatial-reasoning-7ffcc8e00f82) it seems like it could be good to have the same task from the clearest possible camera angle and from a more oblique camera angle (to benchmark the extent to which a model has robust spatial reasoning)

Eugleo added this to MATS Mar 4, 2024

Eugleo converted this from a draft issue Mar 4, 2024

Eugleo changed the title ~~Manually generate trajectories~~ Manually generate trajectories (suggest which ones in comments) Mar 4, 2024

Eugleo changed the title ~~Manually generate trajectories (suggest which ones in comments)~~ Generate trajectories (suggest which ones in comments) Mar 4, 2024

Eugleo mentioned this issue Mar 4, 2024

(🫵 Input needed) Think about how using a RLHF-style proxy reward on top of the VLM influences our plans #6

Closed

Eugleo moved this from Priority to In Progress in MATS Mar 5, 2024

Eugleo changed the title ~~Generate trajectories (suggest which ones in comments)~~ Generate trajectories (suggest additional ones in comments) Mar 5, 2024

Eugleo self-assigned this Mar 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generate trajectories (suggest additional ones in comments) #3

Generate trajectories (suggest additional ones in comments) #3

Eugleo commented Mar 4, 2024 •

edited

Loading

Dont-Care-Didnt-Ask commented Mar 5, 2024 •

edited

Loading

Eugleo commented Mar 5, 2024

Dont-Care-Didnt-Ask commented Mar 5, 2024

evgunter commented Mar 6, 2024

Generate trajectories (suggest additional ones in comments) #3

Generate trajectories (suggest additional ones in comments) #3

Comments

Eugleo commented Mar 4, 2024 • edited Loading

Dont-Care-Didnt-Ask commented Mar 5, 2024 • edited Loading

Eugleo commented Mar 5, 2024

Dont-Care-Didnt-Ask commented Mar 5, 2024

evgunter commented Mar 6, 2024

Eugleo commented Mar 4, 2024 •

edited

Loading

Dont-Care-Didnt-Ask commented Mar 5, 2024 •

edited

Loading