Add AI agent discovery: llms.txt, llms-full.txt, per-page index.md#11433
Open
retran wants to merge 1 commit into
Open
Add AI agent discovery: llms.txt, llms-full.txt, per-page index.md#11433retran wants to merge 1 commit into
retran wants to merge 1 commit into
Conversation
…tput
Generates three artifacts that make the docs consumable by AI agents
and LLM-based tools:
- llms.txt — full site index in llms.txt spec format: H1 title,
blockquote summary, indented bullet links with descriptions in ToC
order. Links point to index.md files.
- llms-full.txt — complete Markdown content of every page in a single
file, in ToC order, with page metadata (URL, Markdown permalink,
description) and raw content per page.
- {page}/index.md — clean Markdown version of every page (home,
section, leaf). Internal links are rewritten from HTML paths
(/path/to/page/) to Markdown paths (/path/to/page/index.md),
including anchor fragments (/path/#section → /path/index.md#section).
External links (https://) are left unchanged.
- robots.txt — explicit Allow rules for 13 AI crawlers (GPTBot,
ClaudeBot, Google-Extended, PerplexityBot, CCBot, Bytespider,
OAI-SearchBot, ChatGPT-User, Claude-SearchBot, Claude-User,
Meta-ExternalAgent, Applebot-Extended, Diffbot), including training
data collection bots, in production only.
Coverage: 4257 index.md files match 4257 real HTML content pages
(total minus 1105 alias redirects). llms.txt and llms-full.txt cover
all non-draft pages (~4255).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
AI assistants and LLM-based tools can already browse the web, but they work better when documentation is served as clean Markdown rather than HTML. The llms.txt convention is emerging as a standard way for sites to expose machine-readable content — similar to what
sitemap.xmldoes for search crawlers.Mendix docs are already a rich, well-structured knowledge base. This PR makes that structure directly accessible to AI agents: they can fetch a single index file to discover all pages, follow links to read individual pages as Markdown, or ingest the full corpus in one request for RAG pipelines.
What
/llms.txt— site index in llms.txt spec format: H1 title, blockquote summary, full ToC-ordered indented link tree (4255 pages). All links point toindex.html.mdfiles./llms-full.txt— complete Markdown content of every page concatenated in ToC order, withURL:/Markdown:/Description:metadata per entry. Suitable for offline RAG ingestion./{page}/index.html.md— clean Markdown version of every page (home, section, leaf), following the llms.txt spec convention for directory-style URLs. Internal links are rewritten from HTML paths (/path/) to Markdown paths (/path/index.html.md) so agents can follow them. External links are unchanged. 4257 files — one per HTML content page./robots.txt— explicitAllow: /for 13 AI crawlers in production (GPTBot, ClaudeBot, Google-Extended, OAI-SearchBot, ChatGPT-User, Claude-SearchBot, Claude-User, PerplexityBot, Meta-ExternalAgent, Applebot-Extended, Diffbot, CCBot, Bytespider).Coverage
index.html.mdfiles generatedllms.txtentriesllms-full.txtentriesImplementation notes
PAGEMD(baseName=index.html,mediaType=text/markdown) generatesindex.html.mdforhome,section, andpagekindssingle.pagemd.mdhandles leaf pages;list.pagemd.mdhandles section pages (overrides docsy'sall.mdcatch-all)](/absolute/path/)and](/path/#anchor), skips anything with:(external URLs)llms.txtandllms-full.txtwalk the page tree via a shared recursive Hugo partial in weight/ToC order from thelandingpageroot sectionDisallow: /in non-production environments🤖 Generated with Claude Code