Skip to content

Add AI agent discovery: llms.txt, llms-full.txt, per-page index.md#11433

Open
retran wants to merge 1 commit into
mendix:developmentfrom
retran:av-ai-agent-docs
Open

Add AI agent discovery: llms.txt, llms-full.txt, per-page index.md#11433
retran wants to merge 1 commit into
mendix:developmentfrom
retran:av-ai-agent-docs

Conversation

@retran

@retran retran commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Why

AI assistants and LLM-based tools can already browse the web, but they work better when documentation is served as clean Markdown rather than HTML. The llms.txt convention is emerging as a standard way for sites to expose machine-readable content — similar to what sitemap.xml does for search crawlers.

Mendix docs are already a rich, well-structured knowledge base. This PR makes that structure directly accessible to AI agents: they can fetch a single index file to discover all pages, follow links to read individual pages as Markdown, or ingest the full corpus in one request for RAG pipelines.

What

  • /llms.txt — site index in llms.txt spec format: H1 title, blockquote summary, full ToC-ordered indented link tree (4255 pages). All links point to index.html.md files.
  • /llms-full.txt — complete Markdown content of every page concatenated in ToC order, with URL: / Markdown: / Description: metadata per entry. Suitable for offline RAG ingestion.
  • /{page}/index.html.md — clean Markdown version of every page (home, section, leaf), following the llms.txt spec convention for directory-style URLs. Internal links are rewritten from HTML paths (/path/) to Markdown paths (/path/index.html.md) so agents can follow them. External links are unchanged. 4257 files — one per HTML content page.
  • /robots.txt — explicit Allow: / for 13 AI crawlers in production (GPTBot, ClaudeBot, Google-Extended, OAI-SearchBot, ChatGPT-User, Claude-SearchBot, Claude-User, PerplexityBot, Meta-ExternalAgent, Applebot-Extended, Diffbot, CCBot, Bytespider).

Coverage

Artifact Count
Real HTML content pages 4257
index.html.md files generated 4257 (exact match)
llms.txt entries 4255 (excludes draft pages)
llms-full.txt entries 4254 (excludes draft pages)

Implementation notes

  • Hugo output format PAGEMD (baseName=index.html, mediaType=text/markdown) generates index.html.md for home, section, and page kinds
  • single.pagemd.md handles leaf pages; list.pagemd.md handles section pages (overrides docsy's all.md catch-all)
  • Link rewriting uses Go RE2 regex: matches ](/absolute/path/) and ](/path/#anchor), skips anything with : (external URLs)
  • llms.txt and llms-full.txt walk the page tree via a shared recursive Hugo partial in weight/ToC order from the landingpage root section
  • All AI crawler rules are Disallow: / in non-production environments

🤖 Generated with Claude Code

…tput

Generates three artifacts that make the docs consumable by AI agents
and LLM-based tools:

- llms.txt — full site index in llms.txt spec format: H1 title,
  blockquote summary, indented bullet links with descriptions in ToC
  order. Links point to index.md files.

- llms-full.txt — complete Markdown content of every page in a single
  file, in ToC order, with page metadata (URL, Markdown permalink,
  description) and raw content per page.

- {page}/index.md — clean Markdown version of every page (home,
  section, leaf). Internal links are rewritten from HTML paths
  (/path/to/page/) to Markdown paths (/path/to/page/index.md),
  including anchor fragments (/path/#section → /path/index.md#section).
  External links (https://) are left unchanged.

- robots.txt — explicit Allow rules for 13 AI crawlers (GPTBot,
  ClaudeBot, Google-Extended, PerplexityBot, CCBot, Bytespider,
  OAI-SearchBot, ChatGPT-User, Claude-SearchBot, Claude-User,
  Meta-ExternalAgent, Applebot-Extended, Diffbot), including training
  data collection bots, in production only.

Coverage: 4257 index.md files match 4257 real HTML content pages
(total minus 1105 alias redirects). llms.txt and llms-full.txt cover
all non-draft pages (~4255).
@retran retran force-pushed the av-ai-agent-docs branch from 30c36df to f6cd82a Compare June 26, 2026 14:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants