Add PhysicsIQ benchmark reproduction cookbook for Cosmos3 by akashgokul · Pull Request #194 · NVIDIA/cosmos

akashgokul · 2026-06-05T20:50:39Z

Adds an end-to-end notebook reproducing the PhysicsIQ benchmark with Cosmos3-Super (and Cosmos3-Nano) via the native cosmos-framework PyTorch entrypoint. Covers both I2V and V2V task formats with verified reference scores (I2V: 43.8, V2V: 59.7). Also adds the prompts we used for I2V and V2V in assets.

akashgokul · 2026-06-09T05:06:52Z

@lfengad @Dinghow Could I get a review pls?

lfengad

Need we move these into cookbooks/cosmos3/generator/physicsiq/ for consistency of the strcuture?

lfengad · 2026-06-09T07:07:32Z

+    "  -i \"$V2V_FULL_INPUT\" \\\n",
+    "  -o \"$V2V_FULL_OUTPUT_DIR\" \\\n",
+    "  --checkpoint-path \"$CHECKPOINT\" \\\n",
+    "  --no-guardrails"


Will this no-guardails needed? And if really need, might could be noted for some security attention?

@lfengad The reason no-guardrails is because this notebook is used to reproduce our evaluation score for Physics-IQ benchmark. For the scores reported in our paper we did not use guardrails. I believe this is fine as the Physics-IQ prompts are appropriate and turning on guardrails may cause the blurring that could cause lower scores.

Please let me know, if there is something I should do to handle this (e.g. may be put a warning notice about the no-guardrails)

Also "Need we move these into cookbooks/cosmos3/generator/physicsiq/ for consistency of the strcuture?", @mingyuliutw asked me to put this in evaluation folder as seen in this PR.

Yeah, I think keeping the benchmark part in a seperated evaluation folder is more appropriate.

yy-code-nv · 2026-06-09T08:16:41Z

Can you upload the prompt file to hf and download? It is quite large for a github asset.

Adds an end-to-end notebook for reproducing the PhysicsIQ benchmark with Cosmos3-Super using the native cosmos-framework PyTorch entrypoint. Location: evaluation/cosmos3/Physics_IQ/ Contents: - run_with_cosmos_framework.ipynb: walks through I2V and V2V task formats end-to-end — download the PhysicsIQ dataset, generate, stage, and optionally score with the official PhysicsIQ scorer. - assets/i2v_prompts.json: 198 per-case I2V prompts + negative prompts - assets/v2v_prompts.json: 198 per-case V2V prompts + negative prompts Reference scores (Cosmos3-Super): I2V 43.8, V2V 59.7. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

Dinghow

LGTM

akashgokul marked this pull request as draft June 5, 2026 21:00

akashgokul force-pushed the feature/physicsiq-benchmark-notebook branch 3 times, most recently from 90346ea to 79ddcae Compare June 5, 2026 21:31

akashgokul marked this pull request as ready for review June 8, 2026 15:37

akashgokul force-pushed the feature/physicsiq-benchmark-notebook branch from 79ddcae to 83f1ac9 Compare June 8, 2026 19:17

Dinghow requested review from Dinghow, foreverlms, lfengad and yy-code-nv June 9, 2026 07:03

lfengad reviewed Jun 9, 2026

View reviewed changes

Dinghow reviewed Jun 9, 2026

View reviewed changes

Comment thread evaluation/cosmos3/Physics_IQ/run_with_cosmos_framework.ipynb Outdated

Comment thread evaluation/cosmos3/Physics_IQ/run_with_cosmos_framework.ipynb Outdated

akashgokul force-pushed the feature/physicsiq-benchmark-notebook branch from 83f1ac9 to 2b70e51 Compare June 9, 2026 16:39

Dinghow approved these changes Jun 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add PhysicsIQ benchmark reproduction cookbook for Cosmos3#194

Add PhysicsIQ benchmark reproduction cookbook for Cosmos3#194
akashgokul wants to merge 1 commit into
NVIDIA:mainfrom
akashgokul:feature/physicsiq-benchmark-notebook

akashgokul commented Jun 5, 2026

Uh oh!

akashgokul commented Jun 9, 2026

Uh oh!

lfengad left a comment •

edited

Loading

Uh oh!

lfengad Jun 9, 2026

Uh oh!

akashgokul Jun 9, 2026

Uh oh!

akashgokul Jun 9, 2026 •

edited

Loading

Uh oh!

Dinghow Jun 10, 2026

Uh oh!

yy-code-nv commented Jun 9, 2026

Uh oh!

Uh oh!

Uh oh!

Dinghow left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

akashgokul commented Jun 5, 2026

Uh oh!

akashgokul commented Jun 9, 2026

Uh oh!

lfengad left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lfengad Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

akashgokul Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

akashgokul Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Dinghow Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

yy-code-nv commented Jun 9, 2026

Uh oh!

Uh oh!

Uh oh!

Dinghow left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

lfengad left a comment •

edited

Loading

akashgokul Jun 9, 2026 •

edited

Loading