Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] short blogpost on tooling around processing video datasets #2631

Merged
merged 13 commits into from
Feb 12, 2025
Prev Previous commit
Next Next commit
Apply suggestions from code review
Co-authored-by: Pedro Cuenca <[email protected]>
hlky and pcuenca authored Feb 6, 2025

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
commit ad98d0058080f9852d4ec6dc7c50ea0e7d5b3b2e
6 changes: 3 additions & 3 deletions vid_ds_scripts.md
Original file line number Diff line number Diff line change
@@ -8,13 +8,13 @@ authors:

# Build awesome datasets for video generation

Tooling for image generation datasets is well established, with [`img2dataset`](https://github.com/rom1504/img2dataset) covering large scale and various community guides, scripts and UIs covering the small scale.
Tooling for image generation datasets is well established, with [`img2dataset`](https://github.com/rom1504/img2dataset) being a fundamental tool used for large scale dataset preparation, and complemented with various community guides, scripts and UIs that cover smaller scale initiatives.

Our goal is to make tooling for video generation datasets as established by creating open video dataset scripts suited for small scale, with [`video2dataset`](https://github.com/iejMac/video2dataset) covering large scale.
Our ambition is to make tooling for video generation datasets equally established, by creating open video dataset scripts suited for small scale, and leveraging [`video2dataset`](https://github.com/iejMac/video2dataset) for large scale use cases.

*“If I have seen further it is by standing on the shoulders of giants”*

In this post, we will provide a overview of the tooling we are developing to make it easy for the community to build their own datasets for fine-tuning video generation models. If you cannot wait to get started already, we welcome you to check out the codebase [here](https://github.com/huggingface/video-dataset-scripts/tree/main/video_processing).
In this post, we provide an overview of the tooling we are developing to make it easy for the community to build their own datasets for fine-tuning video generation models. If you cannot wait to get started already, we welcome you to check out the codebase [here](https://github.com/huggingface/video-dataset-scripts/tree/main/video_processing).

**Table of contents**