Skip to content

Comments

feat(md-exports): Inject frontmatter descriptions into MD exports for LLM relevance#16468

Open
dcramer wants to merge 13 commits intomasterfrom
optimize-md-exports-llm-relevance
Open

feat(md-exports): Inject frontmatter descriptions into MD exports for LLM relevance#16468
dcramer wants to merge 13 commits intomasterfrom
optimize-md-exports-llm-relevance

Conversation

@dcramer
Copy link
Member

@dcramer dcramer commented Feb 19, 2026

Inject frontmatter description as italic text after the H1 heading in
generated .md exports so LLM agents can quickly assess page relevance
from the first few lines. Currently the description is only used for HTML
meta tags and never appears in MD output — this means agents reading the
first ~500 lines get a title and then jump straight into content with no
summary.

The injection happens as a post-processing step after workers finish and
child sections are appended. MDX override pages (which have custom intros)
are skipped. A shared collectR2Upload helper consolidates the R2 sync
logic for both child-section and description-injection modifications.

Also:

  • Add missing description frontmatter to 11 platform root pages
  • Update specs/llm-friendly-docs.md with Description Injection section
  • Strengthen LLM writing guidance in docs/contributing/pages/llm-support.mdx
  • Set up @sentry/dotagents with brand-guidelines skill
  • Add Content Authoring section to AGENTS.md referencing all three skills
  • Wire dotagents install into Makefile develop target

Fixes #16420

… LLM relevance

Inject frontmatter `description` as italic text after the H1 heading in
generated .md exports so LLM agents can quickly assess page relevance.
MDX override pages are skipped since they have custom intros.

Also:
- Add missing descriptions to 11 platform root pages
- Update specs and contributing docs with description injection details
- Set up @sentry/dotagents with brand-guidelines skill
- Add Content Authoring section to AGENTS.md referencing skills
- Wire dotagents install into Makefile develop target

Fixes #16420
Co-Authored-By: Claude <noreply@anthropic.com>
@vercel
Copy link

vercel bot commented Feb 19, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
develop-docs Ready Ready Preview, Comment Feb 24, 2026 8:09pm
sentry-docs Ready Ready Preview, Comment Feb 24, 2026 8:09pm

Request Review

@codeowner-assignment codeowner-assignment bot requested review from a team February 19, 2026 21:31
Inject a documentation index link and platform-specific navigation
after the description in MD exports. Guide pages (e.g., Flask) also
get a link back to their platform index (e.g., Python SDK docs).
This helps LLM agents navigate between related pages.

Co-Authored-By: Claude <noreply@anthropic.com>
dcramer and others added 2 commits February 19, 2026 13:40
Let dotagents manage skill installation at dev time rather than
checking them into the repo. `make develop` runs `dotagents install`.

Co-Authored-By: Claude <noreply@anthropic.com>
Use a Map for R2 uploads so description injection overwrites stale
entries from child section appending for the same page. Use a replacer
function in injectDescription to avoid $ in descriptions being
interpreted as regex replacement patterns.

Co-Authored-By: Claude <noreply@anthropic.com>
dcramer and others added 2 commits February 19, 2026 13:57
Move hash comparison from collectR2Upload to upload time so the Map
always holds the latest content per key. Previously, if description
injection produced content matching R2's existing hash, the stale
child-section-only entry would persist and get uploaded.

Also remove duplicate pathParts computation in guide link logic.

Co-Authored-By: Claude <noreply@anthropic.com>
@BYK
Copy link
Member

BYK commented Feb 20, 2026

@dcramer why don't we just expose the frontmatter. That's how Cloudflare did it so apparently agents are fine with that?

@BYK
Copy link
Member

BYK commented Feb 20, 2026

We can use remark-frontmatter in the pipeline and just feed it the page title, description etc.

Copy link
Member

@BYK BYK left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a fan of this post processing in a post-process script and I strongly believe we can do this with smaller changes and hooking into the MD generation pipeline. Unblocking still as I'm not sure how much that matters for the end result or build times.

* Injects a description and navigation links after the first H1 heading.
* Returns the original content unchanged if no H1 is found.
*/
function injectDescription(markdown, description, {navLinks = []} = {}) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eww, I'm sure we could have done this equally easy by hooking into the markdown pipeline.

Comment on lines 837 to 838
const toUpload = [...r2Uploads].filter(
([key, data]) => existingFilesOnR2.get(key) !== md5(data)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grossly inefficient to use [...r2Uploads] only to run .filter on it. Either a for...of loop here or Array.from() should be used.

Replace the post-processing description injection loop (read-modify-write
on every .md file) with YAML frontmatter emitted in the worker pipeline.
Each task now carries metadata from the doctree, and processTaskList
prepends a YAML block (title, description, url) before writing to disk.

This eliminates the separate read-modify-write pass, removes duplicate
R2 uploads for pages with descriptions, and keeps the cache immune to
metadata changes since frontmatter is added after cache resolution.

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Claude <noreply@anthropic.com>
dcramer and others added 2 commits February 24, 2026 11:40
- Strip /index from nested paths (e.g. dev/index -> dev/) not just
  top-level index
- Replace newlines with spaces in YAML frontmatter title/description
  to prevent invalid YAML output

Co-Authored-By: Claude <noreply@anthropic.com>
dcramer and others added 2 commits February 24, 2026 11:56
- Only create S3Client when R2 uploads are needed
- Prevent NaN in cache miss rate when all tasks fail

Co-Authored-By: Claude <noreply@anthropic.com>
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

};
} else {
taskFrontmatter = frontmatterMap.get(relativePath) || null;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Frontmatter map keys break on Windows paths

Medium Severity

buildFrontmatterMap() stores keys using doctree node.path with forward slashes, but task lookup uses relativePath from path.relative(), which can contain backslashes on Windows. This can make frontmatterMap.get(relativePath) miss, silently dropping YAML metadata (and mdxOverride urlPath cleanup also won’t match \\index).

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Optimize .md exports for LLM agent page-relevance heuristics

2 participants