docs: omit .md pages from llms.txt without removing them completely#2480
Conversation
|
Preview for this PR was built for commit |
|
Cheers. Pls can we add some tests for these special pages, to ensure the .md version work, and also that the HTML version contain the |
|
Preview for this PR was built for commit |
|
Preview for this PR was built for commit |
|
The builds will fail until we deploy it or do we need to make some changes to test assertions in |
|
Preview for this PR was built for commit |
|
The tests should now run correctly against the staging |
|
Great thank you |
marcel-rbro
left a comment
There was a problem hiding this comment.
Can we nudge the review? Is @B4nan the one to review the technical side?
| include ${PWD_PATH}/nginx-test.conf; | ||
| } | ||
| EOF | ||
| sed -i 's|https://apify.github.io/apify-docs|http://localhost:3000|g' default.conf |
There was a problem hiding this comment.
are those changes actually necessary? the CI checks were working fine before, i was expecting you just add a few more test cases here
There was a problem hiding this comment.
Without the change, the tests fail on:
Expected 'text/markdown' in 'Content-Type' for http://localhost:8080/sdk.md
Looks like nginx is proxying to the deployed production site instead of the locally-built one.
There was a problem hiding this comment.
I think that was on purpose, it wasn't possible to test something locally. But I might be wrong, it's been quite some time since I was setting this up.
There was a problem hiding this comment.
Ok, so keeping it as it was
Also what do you mean by that? We dont have any staging env for the docs. |
|
Preview for this PR was built for commit |
This referred to the fix mentioned above that is now reverted |
B4nan
left a comment
There was a problem hiding this comment.
I don't wanna block this, feel free to put the changes back and merge.
Btw I was also confused by you renaming the nginx config to nginx-test.conf, I don't think it's necessary.
…ges-from-llms-txt-without-removing-them-completely
#2496) ## Summary - The `sed` substitution in the `Start Nginx with project config` step ran against `default.conf`, which only contains the wrapper (`worker_processes`, `events`, `http { include nginx.conf; }`) — not the `https://apify.github.io/apify-docs` upstream URL, which lives in `nginx.conf`. So the rewrite was a no-op and the header-assertions step silently proxied to live prod instead of the PR's local Docusaurus serve at port 3000. - This masked regressions in the PR-under-test (the test could pass purely because prod happened to serve the right content) and caused spurious failures when the PR introduced changes that prod hadn't yet picked up — see #2480, where `assert_header ".../sdk.md" "Content-Type" "text/markdown"` failed because prod hadn't been redeployed yet. ## Test plan - [ ] CI `Test / Docs build` job passes - [ ] `Run header assertions` step actually exercises the local build (e.g. break a `.md` route in a follow-up draft and confirm the test now fails) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Follow-up to #2470. Listing pages in the llms-txt plugin's
excludeRoutesalso drops their/<route>.mdcounterparts from the build, so URLs likehttps://docs.apify.com/sdk.mdstarted returning 404 (raised in #2470 (comment)).This PR moves the exclusion from build time to post-build:
docusaurus.config.js: revertexcludeRoutesback to just/and/search; add a NOTE so future contributors don't re-introduce the regression.scripts/joinLlmsFiles.mjs: addLLMS_INDEX_EXCLUDE_PATTERNSand afilterLlmsIndex()postbuild step that strips matching- [Title](url)entries (and now-empty## Sectionheadings) from the generatedbuild/llms.txt. The.mdfiles stay on disk and continue to serve. Also fixes a pre-existing fire-and-forget race betweenjoinFiles()andsanitizeFile().package.json: add@docusaurus/utilsas a direct dependency (used forcreateMatcher)..github/workflows/test.yaml: add regression tests asserting that/sdk.md,/open-source.md,/api/v2/actor-builds-get.md,/api/v2/dataset-get.md, and/academy/tutorials.mdstill servetext/markdown. Also addsassert_final_content_typeso child-repo homepages (/sdk/js,/sdk/python,/api/client/{js,python},/cli) are checked through their nginx redirects for both HTML andAccept: text/markdownresponses.Net effect: same
llms.txtindex as #2470 produced, but the per-page.mdfiles are restored.Test plan
.md-counterpart and child-repo redirect assertions exercise the regression)npm run buildsucceeds locallybuild/llms.txtsize remains under the 100K limit enforced bynpm run test:llms-sizehttps://docs.apify.com/sdk.md,https://docs.apify.com/open-source.md,https://docs.apify.com/api/v2/actor-builds-get.md