WIP: Transcripting code donation #2777

moxious · 2025-05-23T22:11:40Z

tl;dr what is this? It's a small python script with instructions that fetches YouTube transcripts and summarizes them into nice Markdown files. The intent here is to store public text associated with those videos. This is nice by itself, but when combined with this: open-telemetry/opentelemetry.io#6769 it gets better. Kapa can be trained on these, and OTel has a sustainable way to do Q&A on the website based on video.

Core of this PR is the python code which isn't that big. Most of the line changes are actual markdown files which are the output of the python code.

moxious · 2025-05-23T22:17:22Z

Current known limitations: this works by pulling a raw youtube transcript and then summarizing/cleaning up. So when the raw youtube transcript is imperfect (which it often is with names) errors do happen. And so "Reese Lee" becomes sometimes "Ree Lee" and "Adriana Villela" becomes "Adriana Villa". Both the "nice cleaned up version" and the "very messy YouTube original" are included for comparison (and also so it's harder for OpenAI to fool me)

moxious · 2025-05-27T11:17:25Z

The spell checker action will ultimately be impossible to pass with raw YouTube transcripts; in many cases it also flags names (some correct, some incorrect) as unknown words. Will probably need some advice on what to do in this case since there's some tension between "capture what people said" and "make sure it's correct"

dmathieu · 2025-05-27T12:19:07Z

cspell could be made to ignore the transcripts folder.

danielgblanco · 2025-05-27T13:51:40Z

As this is aimed at YouTube transcripts, and I see how it can be really useful for the content the End-User SIG publishes, would it make more sense if this PR is opened against https://github.com/open-telemetry/sig-end-user ?

cc @avillela @reese-lee

svrnm · 2025-06-02T09:58:57Z

As this is aimed at YouTube transcripts, and I see how it can be really useful for the content the End-User SIG publishes, would it make more sense if this PR is opened against open-telemetry/sig-end-user ?

cc @avillela @reese-lee

Not all recordings are from End User SIG right? I think we can start with community and later see if there is better places to have them

danielgblanco · 2025-06-02T10:16:12Z

You're right. We do have YouTube videos that come from Comms SIG. However, as those tends to refer to documentation, do we think this tool is equally useful there? My thinking of putting this in a repo that's not community is that it'd make permissions easier to maintain those scripts.

trask · 2025-06-03T20:14:28Z

hi @moxious, can you send this PR to https://github.com/open-telemetry/sig-end-user instead? thanks

moxious added 2 commits May 23, 2025 17:40

first commit of transcript script for OTel channel w/OpenAI integration

181cd18

first batch of transcripts output

d99d2df

moxious requested review from alolita, austinlparker, danielgblanco, jpkrohling, mtwo, mx-psi, svrnm, tedsuo, trask and a team as code owners May 23, 2025 22:11

second batch of transcripts

8e9a485

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

WIP: Transcripting code donation #2777

WIP: Transcripting code donation #2777

Uh oh!

moxious commented May 23, 2025

Uh oh!

moxious commented May 23, 2025

Uh oh!

moxious commented May 27, 2025

Uh oh!

dmathieu commented May 27, 2025

Uh oh!

danielgblanco commented May 27, 2025 •

edited

Loading

Uh oh!

svrnm commented Jun 2, 2025

Uh oh!

danielgblanco commented Jun 2, 2025

Uh oh!

trask commented Jun 3, 2025

Uh oh!

Uh oh!

WIP: Transcripting code donation #2777

Are you sure you want to change the base?

WIP: Transcripting code donation #2777

Uh oh!

Conversation

moxious commented May 23, 2025

Uh oh!

moxious commented May 23, 2025

Uh oh!

moxious commented May 27, 2025

Uh oh!

dmathieu commented May 27, 2025

Uh oh!

danielgblanco commented May 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

svrnm commented Jun 2, 2025

Uh oh!

danielgblanco commented Jun 2, 2025

Uh oh!

trask commented Jun 3, 2025

Uh oh!

Uh oh!

danielgblanco commented May 27, 2025 •

edited

Loading