Skip to content

fix(web): match YouTube URLs by hostname instead of substring#1182

Open
abhay-codes07 wants to merge 1 commit into
supermemoryai:mainfrom
abhay-codes07:fix/youtube-url-hostname-detection
Open

fix(web): match YouTube URLs by hostname instead of substring#1182
abhay-codes07 wants to merge 1 commit into
supermemoryai:mainfrom
abhay-codes07:fix/youtube-url-hostname-detection

Conversation

@abhay-codes07

Copy link
Copy Markdown

What

Fixes #1175

YouTube detection in the web app was done with substring checks (url.includes("youtube.com") etc.) in four places, so any URL that merely contains those strings got classified as YouTube:

isYouTubeUrl("https://notyoutube.com/watch?v=dQw4w9WgXcQ")            // was true
isYouTubeUrl("https://evil.example/youtube.com/watch?v=dQw4w9WgXcQ")  // was true
isYouTubeUrl("https://youtube.com.evil.example/watch?v=dQw4w9WgXcQ")  // was true

Those URLs then got the YouTube icon in document-icon.tsx, were grouped under "YouTube Videos" in the timeline view, rendered as a YouTube preview on memory cards, and opened in the embed player in the document modal.

How

  • Added a hostname-based isYouTubeUrl() to apps/web/lib/url-helpers.ts, reusing the existing parseWebUrl + hostnameMatches helpers that isTwitterUrl already uses. It matches youtube.com / youtu.be and their real subdomains (www., m., music.) only.
  • components/utils.ts now re-exports the shared helper, so existing imports (memory card previews, useYouTubeChannelName) are unchanged.
  • Switched the three inline substring checks (document-modal/content/index.tsx, document-icon.tsx, timeline-view.tsx) to the shared helper.
  • Side effect worth calling out: the document modal previously only checked for youtube.com, so youtu.be short links fell through to the webpage renderer. They now open in the embed player — yt-video.tsx already knew how to extract IDs from short links, so no change was needed there.

Uppercase schemes (HTTPS://youtube.com/...) and scheme-less inputs (youtube.com/watch?v=...) keep working since parseWebUrl normalizes both.

Testing

  • Added apps/web/lib/url-helpers.test.ts — 10 cases covering canonical/watch/embed/shorts URLs, real subdomains, short links, uppercase schemes, scheme-less input, lookalike domains, subdomain tricks, path-only matches, and nullish input. All pass with bun test.
  • Existing web tests still pass (24 tests across 3 files).
  • biome check clean on all touched files.
  • bun run build for apps/web completes successfully.

cc @MaheshtheDev

isYouTubeUrl() and a few inline checks classified any URL containing
youtube.com / youtu.be as YouTube, so lookalike hosts like
notyoutube.com, youtube.com.evil.example, or URLs that only carry
youtube.com in the path were rendered with the YouTube icon, grouped
as YouTube videos, and opened in the embed player.

Centralize detection in lib/url-helpers.ts using the same
parseWebUrl/hostnameMatches approach isTwitterUrl already uses, and
switch the document modal, document icon, and timeline view to the
shared helper. The modal now also recognizes youtu.be short links,
which the player component already supported.

Add regression tests covering lookalike domains, subdomain tricks,
path-only matches, short/mobile links, uppercase schemes, scheme-less
URLs, and nullish input.

Fixes supermemoryai#1175
Copilot AI review requested due to automatic review settings July 2, 2026 00:45

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review has reached their quota limit.

@vorflux vorflux Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Reviewed — found 1 issue(s). This PR changes YouTube URL detection to use hostname-based matching and updates the web UI call sites plus helper coverage. The review focused on correctness around the widened detection behavior and consistency with existing video ID extraction.

Findings

apps/web/lib/url-helpers.ts

  1. Uppercase YouTube hosts are now detected, but the existing video ID extraction path is still case-sensitive, so some newly accepted URLs can render as invalid instead of embedding.

Verdict

⚠️ Changes requested. The detection and extraction paths need to handle the same accepted URL shapes before this is safe to merge.


Review with Vorflux

* Lookalike hosts (`notyoutube.com`, `youtube.com.evil.example`) and URLs
* that only contain "youtube.com" in the path do not match.
*/
export const isYouTubeUrl = (url: string | undefined | null): boolean => {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isYouTubeUrl() now intentionally accepts uppercase YouTube hosts, as covered by the new cases for HTTPS://youtube.com / WWW.YOUTUBE.COM, but the existing video ID extractors still use case-sensitive regexes for youtube.com / youtu.be. A URL like HTTPS://WWW.YOUTUBE.COM/watch?v=dQw4w9WgXcQ will now be routed to YoutubeVideo / YoutubePreview, but extractVideoId() / extractYouTubeVideoId() return null, so the modal shows Invalid YouTube URL format and the card cannot embed. Please make the extraction path URL/hostname-based too, or at least case-insensitive, before broadening detection.

@vorflux

vorflux Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Testing

The testing subagent verified the hostname-based YouTube detection behavior through targeted helper coverage plus authenticated UI checks for canonical YouTube URLs, non-YouTube URLs containing youtube.com in the path, and notyoutube.com lookalike behavior. The changed behavior passed, with no production code changes pushed.

Commands run:

PATH="/home/ubuntu/.bun/bin:$PATH" bun test apps/web/lib/url-helpers.test.ts

Result:

10 pass
0 fail
21 expect() calls

Canonical YouTube URL classified and rendered as video.
Non-YouTube URL containing youtube.com in the path classified and rendered as webpage.
notyoutube.com lookalike rendered as webpage in scaffolded UI verification.

Evidence:

Verdict

Passed. The tested detection behavior works for canonical hosts and avoids the original substring false positives; one separate review finding remains around keeping video ID extraction consistent with the broader detection.


Attached Images and Videos

pr1182_real_api_list.png

pr1182_real_api_timeline.png

pr1182_real_api_modal_youtube.png

🎥 View recording: pr1182_youtube_hostname_real_api_ui.webm

pr1182_real_api_modal_httpbin.png

pr1182_list_grid_scaffolded.png

pr1182_timeline_scaffolded.png

pr1182_modal_lookalike_scaffolded.png

🎥 View recording: pr1182_youtube_hostname_verified_ui.webm

pr1182_public_preview_real_api_final.png

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

YouTube URL detection matches domains by substring

2 participants