Conversation
Signed-off-by: Ao Tang <aot@nvidia.com>
Greptile OverviewGreptile SummaryOverviewThis PR significantly expands the video tutorial README from a basic reference to a comprehensive, tutorial-style guide with extensive Python code examples. Key Changes
Issues Found
RecommendationsVerify and correct the embedding dimension specifications before merging, as incorrect dimensional information could cause runtime errors in downstream tasks like deduplication or clustering. Confidence Score: 3/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant User
participant Pipeline
participant VideoReader
participant Splitter
participant Transcoder
participant FrameExtractor
participant FilterStage
participant EmbeddingStage
participant Writer
User->>Pipeline: Initialize pipeline
User->>Pipeline: add_stage(VideoReader)
User->>Pipeline: add_stage(FixedStrideExtractorStage/TransNetV2)
User->>Pipeline: add_stage(ClipTranscodingStage)
User->>Pipeline: add_stage(ClipFrameExtractionStage)
User->>Pipeline: add_stage(ClipAestheticFilterStage)
User->>Pipeline: add_stage(CosmosEmbed1FrameCreationStage)
User->>Pipeline: add_stage(CosmosEmbed1EmbeddingStage)
User->>Pipeline: add_stage(ClipWriterStage)
User->>Pipeline: run(XennaExecutor)
Pipeline->>VideoReader: Read videos from VIDEO_DIR
VideoReader-->>Pipeline: VideoTask objects
Pipeline->>Splitter: Split videos into clips
Splitter-->>Pipeline: VideoTask with clips
Pipeline->>Transcoder: Transcode clips to H.264
Transcoder-->>Pipeline: Encoded clips with buffers
Pipeline->>FrameExtractor: Extract frames for embeddings/aesthetics
FrameExtractor-->>Pipeline: Clips with extracted frames
Pipeline->>FilterStage: Filter by aesthetic score
FilterStage-->>Pipeline: Filtered clips (valid/invalid flags)
Pipeline->>EmbeddingStage: Create model-ready frames
EmbeddingStage-->>Pipeline: Clips with cosmos_embed1_frames
Pipeline->>EmbeddingStage: Generate embeddings
EmbeddingStage-->>Pipeline: Clips with embeddings
Pipeline->>Writer: Write clips, metadata, embeddings
Writer-->>Pipeline: Files written to OUTPUT_DIR
Pipeline-->>User: Pipeline complete
|
| ### Cosmos-Embed1 (Recommended) | ||
|
|
||
|
|
There was a problem hiding this comment.
syntax: Section header for Cosmos-Embed1 is empty/incomplete.
| ### Cosmos-Embed1 (Recommended) | |
| ### Cosmos-Embed1 (Recommended) | |
| Cosmos-Embed1 provides high-quality video embeddings with 768 dimensions. It offers better performance than InternVideo2 for most use cases. |
| ### Cosmos-Embed1 (Recommended) | ||
|
|
||
|
|
There was a problem hiding this comment.
syntax: Empty section for Cosmos-Embed1. Add description of features, output dimensions (768-dim embeddings), or usage details.
| ### Cosmos-Embed1 (Recommended) | |
| ### Cosmos-Embed1 (Recommended) | |
| Cosmos-Embed1 provides high-quality video embeddings with support for multiple resolution variants. It's the recommended choice for most video embedding tasks. | |
| - Output: 768-dimensional embeddings | |
| - Variants: `224p`, `336p`, `448p` (higher resolution = better quality, more memory) | |
| - Best for: similarity search, clustering, and general video understanding tasks | |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Signed-off-by: Lawrence Lane <llane@nvidia.com>
| ### Cosmos-Embed1 (Recommended) | ||
|
|
||
|
|
There was a problem hiding this comment.
syntax: Empty Cosmos-Embed1 section. Add content describing the model's features, capabilities, and embedding dimensions (768-dim).
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Signed-off-by: Lawrence Lane <llane@nvidia.com>
|
|
||
| ### Prerequisites | ||
|
|
||
| 1. **Set up directories**: |
There was a problem hiding this comment.
Empty "Set up directories" section. Either add content explaining directory setup or remove this numbered item.
| 1. **Set up directories**: | |
| ### Prerequisites |
| ```python | ||
| from nemo_curator.stages.video.caption.caption_preparation import CaptionPreparationStage | ||
| from nemo_curator.stages.video.caption.caption_generation import CaptionGenerationStage | ||
| from nemo_curator.stages.video.caption.caption_enhancement import CaptionEnhancementStage | ||
|
|
||
| # Prepare frames for captioning | ||
| pipeline.add_stage( | ||
| CaptionPreparationStage( | ||
| model_variant="qwen", | ||
| prompt_variant="default", | ||
| sampling_fps=2.0, | ||
| window_size=256, | ||
| ) | ||
| ) | ||
|
|
||
| # Generate captions with Qwen-VL | ||
| pipeline.add_stage( | ||
| CaptionGenerationStage( | ||
| model_dir="./models", | ||
| model_variant="qwen", | ||
| caption_batch_size=8, | ||
| max_output_tokens=512, | ||
| ) | ||
| ) | ||
|
|
||
| # Optional: Enhance captions with LLM | ||
| pipeline.add_stage( | ||
| CaptionEnhancementStage( | ||
| model_dir="./models", | ||
| model_variant="qwen", | ||
| model_batch_size=128, | ||
| ) | ||
| ) | ||
| ``` |
There was a problem hiding this comment.
Code block appears disconnected from section structure. Add a descriptive section header before this captioning example (e.g. "### Adding Captions to Pipeline") and explain what pipeline refers to.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
| ### Cosmos-Embed1 (Recommended) | ||
|
|
||
|
|
There was a problem hiding this comment.
Empty Cosmos-Embed1 section. Add description, usage example, or embedding dimensions (768-dim as mentioned in line 274).
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
|
added suggested changes in a branch to merge into yours. feel free to take/leave any of them #1442 |
* updates Signed-off-by: Lawrence Lane <llane@nvidia.com> * Update tutorials/video/getting-started/README.md Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by: Lawrence Lane <llane@nvidia.com> --------- Signed-off-by: Lawrence Lane <llane@nvidia.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
| ) | ||
| ``` | ||
|
|
||
| Output: 512-dimensional embeddings per clip. |
There was a problem hiding this comment.
Verify embedding dimensions. Documentation states Cosmos-Embed1 dimensions "vary by variant" (see docs/curate-video/process-data/dedup.md), but this claims 512 dims uniformly.
| Output: 512-dimensional embeddings per clip. | |
| Output: Embedding dimensions vary by variant (typically 512 or 768 dimensions per clip). |
| - **`embedding`**: List of float values (512 dimensions for InternVideo2, 768 for Cosmos-Embed1) | ||
|
|
||
| - `id`: String UUID for the clip | ||
| - `embedding`: List of float values (512 dimensions for both InternVideo2 and Cosmos-Embed1) |
There was a problem hiding this comment.
Embedding dimension claim needs verification. Official docs state Cosmos-Embed1 dimensions vary by variant, contradicting "512 dimensions for both" claim.
| - `embedding`: List of float values (512 dimensions for both InternVideo2 and Cosmos-Embed1) | |
| - `embedding`: List of float values (512 dimensions for InternVideo2; Cosmos-Embed1 dimensions vary by variant) |
|
|
||
| ### InternVideo2 | ||
|
|
||
| Alternative embedding model requiring separate installation. InternVideo2 also produces 512-dimensional embeddings. |
There was a problem hiding this comment.
Same embedding dimension issue as line 238.
| Alternative embedding model requiring separate installation. InternVideo2 also produces 512-dimensional embeddings. | |
| Alternative embedding model requiring separate installation. InternVideo2 produces 512-dimensional embeddings. |
| pipeline.add_stage(InternVideo2EmbeddingStage(model_dir="./models")) | ||
| ``` | ||
|
|
||
| Output: 512-dimensional embeddings per clip. |
There was a problem hiding this comment.
Same embedding dimension issue - remove incorrect claim about matching InternVideo2's dimensions.
| Output: 512-dimensional embeddings per clip. | |
| Output: 512-dimensional embeddings per clip. |
Description
Usage
# Add snippet demonstrating usageChecklist