Skip to content

Add PhysicsIQ benchmark reproduction cookbook for Cosmos3#194

Open
akashgokul wants to merge 1 commit into
NVIDIA:mainfrom
akashgokul:feature/physicsiq-benchmark-notebook
Open

Add PhysicsIQ benchmark reproduction cookbook for Cosmos3#194
akashgokul wants to merge 1 commit into
NVIDIA:mainfrom
akashgokul:feature/physicsiq-benchmark-notebook

Conversation

@akashgokul

Copy link
Copy Markdown
Collaborator

Adds an end-to-end notebook reproducing the PhysicsIQ benchmark with Cosmos3-Super (and Cosmos3-Nano) via the native cosmos-framework PyTorch entrypoint. Covers both I2V and V2V task formats with verified reference scores (I2V: 43.8, V2V: 59.7). Also adds the prompts we used for I2V and V2V in assets.

@akashgokul akashgokul marked this pull request as draft June 5, 2026 21:00
@akashgokul akashgokul force-pushed the feature/physicsiq-benchmark-notebook branch 3 times, most recently from 90346ea to 79ddcae Compare June 5, 2026 21:31
@akashgokul akashgokul marked this pull request as ready for review June 8, 2026 15:37
@akashgokul akashgokul force-pushed the feature/physicsiq-benchmark-notebook branch from 79ddcae to 83f1ac9 Compare June 8, 2026 19:17
@akashgokul

Copy link
Copy Markdown
Collaborator Author

@lfengad @Dinghow Could I get a review pls?

@lfengad lfengad left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need we move these into cookbooks/cosmos3/generator/physicsiq/ for consistency of the strcuture?

" -i \"$V2V_FULL_INPUT\" \\\n",
" -o \"$V2V_FULL_OUTPUT_DIR\" \\\n",
" --checkpoint-path \"$CHECKPOINT\" \\\n",
" --no-guardrails"

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this no-guardails needed? And if really need, might could be noted for some security attention?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lfengad The reason no-guardrails is because this notebook is used to reproduce our evaluation score for Physics-IQ benchmark. For the scores reported in our paper we did not use guardrails. I believe this is fine as the Physics-IQ prompts are appropriate and turning on guardrails may cause the blurring that could cause lower scores.

Please let me know, if there is something I should do to handle this (e.g. may be put a warning notice about the no-guardrails)

@akashgokul akashgokul Jun 9, 2026

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also "Need we move these into cookbooks/cosmos3/generator/physicsiq/ for consistency of the strcuture?", @mingyuliutw asked me to put this in evaluation folder as seen in this PR.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think keeping the benchmark part in a seperated evaluation folder is more appropriate.

@yy-code-nv

Copy link
Copy Markdown
Collaborator

Can you upload the prompt file to hf and download? It is quite large for a github asset.

Comment thread evaluation/cosmos3/Physics_IQ/run_with_cosmos_framework.ipynb Outdated
Comment thread evaluation/cosmos3/Physics_IQ/run_with_cosmos_framework.ipynb Outdated
Adds an end-to-end notebook for reproducing the PhysicsIQ benchmark with
Cosmos3-Super using the native cosmos-framework PyTorch entrypoint.

Location: evaluation/cosmos3/Physics_IQ/

Contents:
- run_with_cosmos_framework.ipynb: walks through I2V and V2V task formats
  end-to-end — download the PhysicsIQ dataset, generate, stage, and
  optionally score with the official PhysicsIQ scorer.
- assets/i2v_prompts.json: 198 per-case I2V prompts + negative prompts
- assets/v2v_prompts.json: 198 per-case V2V prompts + negative prompts

Reference scores (Cosmos3-Super): I2V 43.8, V2V 59.7.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
@akashgokul akashgokul force-pushed the feature/physicsiq-benchmark-notebook branch from 83f1ac9 to 2b70e51 Compare June 9, 2026 16:39

@Dinghow Dinghow left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants