Skip to content

Add Code Examples to Video Tutorial#1348

Open
suiyoubi wants to merge 6 commits intomainfrom
aot/video-tutorial-improv
Open

Add Code Examples to Video Tutorial#1348
suiyoubi wants to merge 6 commits intomainfrom
aot/video-tutorial-improv

Conversation

@suiyoubi
Copy link
Contributor

@suiyoubi suiyoubi commented Jan 2, 2026

Description

Usage

# Add snippet demonstrating usage

Checklist

  • I am familiar with the Contributing Guide.
  • New or Existing tests cover these changes.
  • The documentation is up to date with these changes.

Signed-off-by: Ao Tang <aot@nvidia.com>
@copy-pr-bot
Copy link

copy-pr-bot bot commented Jan 2, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Jan 2, 2026

Greptile Overview

Greptile Summary

Overview

This PR significantly expands the video tutorial README from a basic reference to a comprehensive, tutorial-style guide with extensive Python code examples.

Key Changes

  • Added complete Python pipeline example showing how to chain video processing stages programmatically (lines 85-157)
  • Restructured content with clear learning objectives, prerequisites, and expected output examples
  • Added dedicated sections for embedding models, captioning, and metadata schema
  • Improved CLI examples with better context and explanations for each workflow
  • Added comparison table for Cosmos-Embed1 variants (224p/336p/448p) with GPU memory requirements

Issues Found

  • Embedding dimension accuracy: Multiple statements claim both InternVideo2 and Cosmos-Embed1 produce 512-dimensional embeddings. However, the official documentation (docs/curate-video/process-data/dedup.md) states that Cosmos-Embed1 dimensions "vary by variant," suggesting this claim may be incorrect.

Recommendations

Verify and correct the embedding dimension specifications before merging, as incorrect dimensional information could cause runtime errors in downstream tasks like deduplication or clustering.

Confidence Score: 3/5

  • Safe to merge after verifying embedding dimensions
  • Documentation improvements are valuable, but contains a factual inaccuracy about embedding dimensions that contradicts official docs and could mislead users
  • Verify embedding dimension claims in tutorials/video/getting-started/README.md at lines 238, 242, 269, and 381

Important Files Changed

Filename Overview
tutorials/video/getting-started/README.md Comprehensive tutorial rewrite with code examples, adds Python API usage patterns and detailed explanations. Contains potential accuracy issue with embedding dimensions.

Sequence Diagram

sequenceDiagram
    participant User
    participant Pipeline
    participant VideoReader
    participant Splitter
    participant Transcoder
    participant FrameExtractor
    participant FilterStage
    participant EmbeddingStage
    participant Writer
    
    User->>Pipeline: Initialize pipeline
    User->>Pipeline: add_stage(VideoReader)
    User->>Pipeline: add_stage(FixedStrideExtractorStage/TransNetV2)
    User->>Pipeline: add_stage(ClipTranscodingStage)
    User->>Pipeline: add_stage(ClipFrameExtractionStage)
    User->>Pipeline: add_stage(ClipAestheticFilterStage)
    User->>Pipeline: add_stage(CosmosEmbed1FrameCreationStage)
    User->>Pipeline: add_stage(CosmosEmbed1EmbeddingStage)
    User->>Pipeline: add_stage(ClipWriterStage)
    
    User->>Pipeline: run(XennaExecutor)
    
    Pipeline->>VideoReader: Read videos from VIDEO_DIR
    VideoReader-->>Pipeline: VideoTask objects
    
    Pipeline->>Splitter: Split videos into clips
    Splitter-->>Pipeline: VideoTask with clips
    
    Pipeline->>Transcoder: Transcode clips to H.264
    Transcoder-->>Pipeline: Encoded clips with buffers
    
    Pipeline->>FrameExtractor: Extract frames for embeddings/aesthetics
    FrameExtractor-->>Pipeline: Clips with extracted frames
    
    Pipeline->>FilterStage: Filter by aesthetic score
    FilterStage-->>Pipeline: Filtered clips (valid/invalid flags)
    
    Pipeline->>EmbeddingStage: Create model-ready frames
    EmbeddingStage-->>Pipeline: Clips with cosmos_embed1_frames
    
    Pipeline->>EmbeddingStage: Generate embeddings
    EmbeddingStage-->>Pipeline: Clips with embeddings
    
    Pipeline->>Writer: Write clips, metadata, embeddings
    Writer-->>Pipeline: Files written to OUTPUT_DIR
    
    Pipeline-->>User: Pipeline complete
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Comment on lines 363 to 365
### Cosmos-Embed1 (Recommended)


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

syntax: Section header for Cosmos-Embed1 is empty/incomplete.

Suggested change
### Cosmos-Embed1 (Recommended)
### Cosmos-Embed1 (Recommended)
Cosmos-Embed1 provides high-quality video embeddings with 768 dimensions. It offers better performance than InternVideo2 for most use cases.

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Comment on lines 363 to 365
### Cosmos-Embed1 (Recommended)


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

syntax: Empty section for Cosmos-Embed1. Add description of features, output dimensions (768-dim embeddings), or usage details.

Suggested change
### Cosmos-Embed1 (Recommended)
### Cosmos-Embed1 (Recommended)
Cosmos-Embed1 provides high-quality video embeddings with support for multiple resolution variants. It's the recommended choice for most video embedding tasks.
- Output: 768-dimensional embeddings
- Variants: `224p`, `336p`, `448p` (higher resolution = better quality, more memory)
- Best for: similarity search, clustering, and general video understanding tasks

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Signed-off-by: Lawrence Lane <llane@nvidia.com>
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Comment on lines 363 to 365
### Cosmos-Embed1 (Recommended)


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

syntax: Empty Cosmos-Embed1 section. Add content describing the model's features, capabilities, and embedding dimensions (768-dim).

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Signed-off-by: Lawrence Lane <llane@nvidia.com>
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, 3 comments

Edit Code Review Agent Settings | Greptile


### Prerequisites

1. **Set up directories**:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empty "Set up directories" section. Either add content explaining directory setup or remove this numbered item.

Suggested change
1. **Set up directories**:
### Prerequisites

Comment on lines 26 to 59
```python
from nemo_curator.stages.video.caption.caption_preparation import CaptionPreparationStage
from nemo_curator.stages.video.caption.caption_generation import CaptionGenerationStage
from nemo_curator.stages.video.caption.caption_enhancement import CaptionEnhancementStage

# Prepare frames for captioning
pipeline.add_stage(
CaptionPreparationStage(
model_variant="qwen",
prompt_variant="default",
sampling_fps=2.0,
window_size=256,
)
)

# Generate captions with Qwen-VL
pipeline.add_stage(
CaptionGenerationStage(
model_dir="./models",
model_variant="qwen",
caption_batch_size=8,
max_output_tokens=512,
)
)

# Optional: Enhance captions with LLM
pipeline.add_stage(
CaptionEnhancementStage(
model_dir="./models",
model_variant="qwen",
model_batch_size=128,
)
)
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code block appears disconnected from section structure. Add a descriptive section header before this captioning example (e.g. "### Adding Captions to Pipeline") and explain what pipeline refers to.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Comment on lines 179 to 181
### Cosmos-Embed1 (Recommended)


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empty Cosmos-Embed1 section. Add description, usage example, or embedding dimensions (768-dim as mentioned in line 274).

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

@lbliii
Copy link
Contributor

lbliii commented Jan 29, 2026

added suggested changes in a branch to merge into yours. feel free to take/leave any of them #1442

* updates

Signed-off-by: Lawrence Lane <llane@nvidia.com>

* Update tutorials/video/getting-started/README.md

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Signed-off-by: Lawrence Lane <llane@nvidia.com>

---------

Signed-off-by: Lawrence Lane <llane@nvidia.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, no comments

Edit Code Review Agent Settings | Greptile

@lbliii lbliii self-requested a review February 2, 2026 18:36
@lbliii lbliii enabled auto-merge (squash) February 2, 2026 18:36
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, 4 comments

Edit Code Review Agent Settings | Greptile

)
```

Output: 512-dimensional embeddings per clip.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verify embedding dimensions. Documentation states Cosmos-Embed1 dimensions "vary by variant" (see docs/curate-video/process-data/dedup.md), but this claims 512 dims uniformly.

Suggested change
Output: 512-dimensional embeddings per clip.
Output: Embedding dimensions vary by variant (typically 512 or 768 dimensions per clip).

- **`embedding`**: List of float values (512 dimensions for InternVideo2, 768 for Cosmos-Embed1)

- `id`: String UUID for the clip
- `embedding`: List of float values (512 dimensions for both InternVideo2 and Cosmos-Embed1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Embedding dimension claim needs verification. Official docs state Cosmos-Embed1 dimensions vary by variant, contradicting "512 dimensions for both" claim.

Suggested change
- `embedding`: List of float values (512 dimensions for both InternVideo2 and Cosmos-Embed1)
- `embedding`: List of float values (512 dimensions for InternVideo2; Cosmos-Embed1 dimensions vary by variant)


### InternVideo2

Alternative embedding model requiring separate installation. InternVideo2 also produces 512-dimensional embeddings.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same embedding dimension issue as line 238.

Suggested change
Alternative embedding model requiring separate installation. InternVideo2 also produces 512-dimensional embeddings.
Alternative embedding model requiring separate installation. InternVideo2 produces 512-dimensional embeddings.

pipeline.add_stage(InternVideo2EmbeddingStage(model_dir="./models"))
```

Output: 512-dimensional embeddings per clip.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same embedding dimension issue - remove incorrect claim about matching InternVideo2's dimensions.

Suggested change
Output: 512-dimensional embeddings per clip.
Output: 512-dimensional embeddings per clip.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants