A tool for downloading and exploring YouTube's auto-captions for playlists/channels etc.
It's still very rough, everything runs as a dev build/server, and only surface level optimisation/architecture is present. It's still enough to function though, and can handle (a little sluggishly) 2000+ video playlists with at least a couple months of dialogue.
Using youtube-dl
, playlist information is downloaded and auto-subtitle content for each video is pulled.
Subtitle content is searched and explored using basic user interface built with React.
If you've ever wanted to quickly search for something that was said in a series of videos, or an entire channel, this is for you. Some use cases include:
- Searching for concepts covered in tutorial series
- Exploring ideas and concepts in debate/video essay style channels
- Find that one thing someone said in a video, but you can't remember which one
- Explore lore behind story-heavy channels
- Install Node.js
- Clone the repo
npm i
- inside the repo to install dependenciesnpm run ytdl [playlist_url] [playlist_url] ...
- Downloads playlists to be explored, wait for it to finish before moving on- Additionally specify
update
withnpm run ytdl
to update any existing playlist data, in addition to any new URLs provided - Additionally specify
threads=x
withnpm run ytdl
to set the number of youtube-dl processes to spawn [min=1, max=100]
- Additionally specify
npm start
- Should open the UI in web browser!- Spam
ctrl-c
to close I guess
I highly recommend using a VPN to avoid being IP limited or banned.
- You must provide a playlist. Even if you just want one video, wrap it in a playlist!
- YouTube does a lot with playlists. For example, clicking "Play all" on a channel uploads will give you a playlist link. This is how you index an entire channel.
- The subtitles are, in most cases, provided by YouTube's auto-captions, though it does download official ones if present!
- This tool does NO speech-to-text. All captions are from YouTube, so if the subtitles say one thing, but the video says another, that's not my fault
- This tool can make a LOT of requests to YouTube, which may result in YouTube temporarily restricting connections from your device - use it at your own discretion
- It's clunky and buggy, and is not an enterprise-ready™ solution. As such, a lot of the implemented QoL features like updating local playlists etc. emply naive solutions which may not work well. For example, if you index a video which does not have subtitles, but then has subtitles added to it, the tool won't be aware of this.
- If in doubt, delete the playlist and download it again!
Pull playlist informationPull video informationPull subtitles for each videoImplement basic API for front-end to access the dataSecondary STT method for extracting subtitles- I don't want to be liable for mis-subtitling.Figure out how to efficiently search for text across millions of subtitle entriesFlexSearchRefresh playlist_data folder on changes so you don't need to keep restarting the UI- Include playlist title in Ytd
Download playlists from UI
Init basic React appAPI ComponentSearch componentResults page componentResult component- Video transcript explorer