Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate trajectories (suggest additional ones in comments) #3

Open
6 of 9 tasks
Eugleo opened this issue Mar 4, 2024 · 4 comments
Open
6 of 9 tasks

Generate trajectories (suggest additional ones in comments) #3

Eugleo opened this issue Mar 4, 2024 · 4 comments
Assignees

Comments

@Eugleo
Copy link
Owner

Eugleo commented Mar 4, 2024

Prerequisite: #6.

My current plan is to generate a around 3 "positive" trajectories and 3 "negative" trajectories for all of the tasks below.

My current plan is to generate 3 trajectories and 3 alternative descriptions per task.

Priorities:

  • Detecting the type of room the agent is in
  • Detecting presence of an object in the frame (window?)
  • Videos from modeled scenes instead of the photorealistic ones
  • Walking up and down stairs (video understanding)
  • Walking through objects (video understanding)
  • Throwing something on the ground vs it already being on the ground (video understanding)
  • Object recognition for out-of-distribution objects (e.g. random objects in a room)
  • Spatial reasoning
  • Temporal reasoning (this with the one above could be one set of test trajectories)

Ideas:

  • Different, more out-of-distribution angles
  • Detecting proximity to an object (ideally in a scene where we also have GT for this)
  • Dropping something the agent held on the ground.
  • Toppling something over.
  • Pushing something from the top of the table to make it fall.
  • Natural-looking movement (e.g. walking straight vs strafing to one side to get somewhere)
@Eugleo Eugleo added this to MATS Mar 4, 2024
@Eugleo Eugleo converted this from a draft issue Mar 4, 2024
@Eugleo Eugleo changed the title Manually generate trajectories Manually generate trajectories (suggest which ones in comments) Mar 4, 2024
@Eugleo Eugleo changed the title Manually generate trajectories (suggest which ones in comments) Generate trajectories (suggest which ones in comments) Mar 4, 2024
@Eugleo Eugleo moved this from Priority to In Progress in MATS Mar 5, 2024
@Eugleo Eugleo changed the title Generate trajectories (suggest which ones in comments) Generate trajectories (suggest additional ones in comments) Mar 5, 2024
@Eugleo Eugleo self-assigned this Mar 5, 2024
@Dont-Care-Didnt-Ask
Copy link
Collaborator

Dont-Care-Didnt-Ask commented Mar 5, 2024

How will "negative" trajectories look like?

It seems to me, that we can generally use positives from other tasks as negatives. So I would rather propose to make 6 diverse positives for each task. Positive descriptions also work well for "all-versus-all" evaluation, which I outlined in #4.

@Eugleo
Copy link
Owner Author

Eugleo commented Mar 5, 2024

It seems that by the two different setups we answer two slightly different questions:

All-v-all: We assume all we care about are the different tasks we measure. Then we ask: Can the VLM distinguish those from each other?

Pos+Neg examples: Assuming the neg examples are good (e.g. you almost see a window but not quite), we answer the question: Can the model recognize this task by itself, reliably?

All in all is in some way easier for us, because thinking about what would be a good negative example (and trying to get a lot of them) is a futile task.

However, pos+neg is easier in other ways — namely, it might be hard to have trajectories that can only have one label in this env (e.g. you're looking at a window but also inadvertently getting closer to a vase).

Maybe I can try doing all-v-all, and if the task overlap is hard to get rid of I'll switch to pos+neg?

@Dont-Care-Didnt-Ask
Copy link
Collaborator

Yes, this sounds reasonable. I agree that trajectories from other tasks will not necessarily be the hardest negatives, but the hope is that at least we'll have a lot of "medium-hard" negatives.

We can think about specialized, good negative examples as an extension of benchmark -- the hard version (and therefore we should focus on them later).

@evgunter
Copy link
Collaborator

evgunter commented Mar 6, 2024

based on the clip spatial reasoning article from slack (https://medium.com/@hendrik.suvalov/evaluating-clip-for-spatial-reasoning-7ffcc8e00f82) it seems like it could be good to have the same task from the clearest possible camera angle and from a more oblique camera angle (to benchmark the extent to which a model has robust spatial reasoning)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

No branches or pull requests

3 participants