feat(md-exports): Inject frontmatter descriptions into MD exports for LLM relevance#16468
feat(md-exports): Inject frontmatter descriptions into MD exports for LLM relevance#16468
Conversation
… LLM relevance Inject frontmatter `description` as italic text after the H1 heading in generated .md exports so LLM agents can quickly assess page relevance. MDX override pages are skipped since they have custom intros. Also: - Add missing descriptions to 11 platform root pages - Update specs and contributing docs with description injection details - Set up @sentry/dotagents with brand-guidelines skill - Add Content Authoring section to AGENTS.md referencing skills - Wire dotagents install into Makefile develop target Fixes #16420 Co-Authored-By: Claude <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Inject a documentation index link and platform-specific navigation after the description in MD exports. Guide pages (e.g., Flask) also get a link back to their platform index (e.g., Python SDK docs). This helps LLM agents navigate between related pages. Co-Authored-By: Claude <noreply@anthropic.com>
Let dotagents manage skill installation at dev time rather than checking them into the repo. `make develop` runs `dotagents install`. Co-Authored-By: Claude <noreply@anthropic.com>
Use a Map for R2 uploads so description injection overwrites stale entries from child section appending for the same page. Use a replacer function in injectDescription to avoid $ in descriptions being interpreted as regex replacement patterns. Co-Authored-By: Claude <noreply@anthropic.com>
Move hash comparison from collectR2Upload to upload time so the Map always holds the latest content per key. Previously, if description injection produced content matching R2's existing hash, the stale child-section-only entry would persist and get uploaded. Also remove duplicate pathParts computation in guide link logic. Co-Authored-By: Claude <noreply@anthropic.com>
|
@dcramer why don't we just expose the frontmatter. That's how Cloudflare did it so apparently agents are fine with that? |
|
We can use remark-frontmatter in the pipeline and just feed it the page title, description etc. |
BYK
left a comment
There was a problem hiding this comment.
Not a fan of this post processing in a post-process script and I strongly believe we can do this with smaller changes and hooking into the MD generation pipeline. Unblocking still as I'm not sure how much that matters for the end result or build times.
scripts/generate-md-exports.mjs
Outdated
| * Injects a description and navigation links after the first H1 heading. | ||
| * Returns the original content unchanged if no H1 is found. | ||
| */ | ||
| function injectDescription(markdown, description, {navLinks = []} = {}) { |
There was a problem hiding this comment.
Eww, I'm sure we could have done this equally easy by hooking into the markdown pipeline.
scripts/generate-md-exports.mjs
Outdated
| const toUpload = [...r2Uploads].filter( | ||
| ([key, data]) => existingFilesOnR2.get(key) !== md5(data) |
There was a problem hiding this comment.
Grossly inefficient to use [...r2Uploads] only to run .filter on it. Either a for...of loop here or Array.from() should be used.
Replace the post-processing description injection loop (read-modify-write on every .md file) with YAML frontmatter emitted in the worker pipeline. Each task now carries metadata from the doctree, and processTaskList prepends a YAML block (title, description, url) before writing to disk. This eliminates the separate read-modify-write pass, removes duplicate R2 uploads for pages with descriptions, and keeps the cache immune to metadata changes since frontmatter is added after cache resolution. Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Claude <noreply@anthropic.com>
- Strip /index from nested paths (e.g. dev/index -> dev/) not just top-level index - Replace newlines with spaces in YAML frontmatter title/description to prevent invalid YAML output Co-Authored-By: Claude <noreply@anthropic.com>
- Only create S3Client when R2 uploads are needed - Prevent NaN in cache miss rate when all tasks fail Co-Authored-By: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
| }; | ||
| } else { | ||
| taskFrontmatter = frontmatterMap.get(relativePath) || null; | ||
| } |
There was a problem hiding this comment.
Frontmatter map keys break on Windows paths
Medium Severity
buildFrontmatterMap() stores keys using doctree node.path with forward slashes, but task lookup uses relativePath from path.relative(), which can contain backslashes on Windows. This can make frontmatterMap.get(relativePath) miss, silently dropping YAML metadata (and mdxOverride urlPath cleanup also won’t match \\index).


Inject frontmatter
descriptionas italic text after the H1 heading ingenerated
.mdexports so LLM agents can quickly assess page relevancefrom the first few lines. Currently the description is only used for HTML
meta tags and never appears in MD output — this means agents reading the
first ~500 lines get a title and then jump straight into content with no
summary.
The injection happens as a post-processing step after workers finish and
child sections are appended. MDX override pages (which have custom intros)
are skipped. A shared
collectR2Uploadhelper consolidates the R2 synclogic for both child-section and description-injection modifications.
Also:
descriptionfrontmatter to 11 platform root pagesspecs/llm-friendly-docs.mdwith Description Injection sectiondocs/contributing/pages/llm-support.mdx@sentry/dotagentswithbrand-guidelinesskilldotagents installinto MakefiledeveloptargetFixes #16420