feat: replace Elasticsearch search with Pagefind#5106
Conversation
Migrate documentation search from a self-hosted Elasticsearch index to Pagefind, a static search library that crawls the built HTML in public/ and writes a chunked index into public/pagefind/. This removes the need for an Elasticsearch server, local docker container, CORS workarounds, and the ELASTICSEARCH_* CI secrets. - package.json: drop @elastic/elasticsearch, cheerio, html-entities and lodash; add pagefind dev dependency and a `pagefind` npm script - site.yml / extensions.md: remove the generate-index.js Antora extension - delete ext-antora/generate-index.js and es-docker-compose.yml - ci.yml: drop ELASTICSEARCH_* env/secrets, run `npm run pagefind` after the Antora build so the deployed site always ships a fresh index - build-the-docs.md: rewrite the Search section for the Pagefind workflow Verified locally: `npm run antora` builds cleanly with no reference to the removed extension, and `npm run pagefind` indexes 1359 pages. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Thomas Müller <1005065+DeepDiver1975@users.noreply.github.com>
f4ebd0b to
49de010
Compare
|
frontend part: owncloud/docs-ui#1018 |
phil-davis
left a comment
There was a problem hiding this comment.
LGTM, the search indexing will be all done as part of the publishing.
Code reviewPipeline half of the Elasticsearch → Pagefind migration: removes the ES index-generation extension + service and runs the Strengths
Issues & suggestions🟡 No guard against an empty index. npm run pagefind | tee /tmp/pf.log
grep -qE "Indexed [1-9][0-9]+ pages" /tmp/pf.log || { echo "Pagefind index suspiciously small"; exit 1; }🟢 Loose version pin. 🟢 Nit: VerdictApprove, with one pre-merge ask: bump |
Address review feedback on the Elasticsearch -> Pagefind migration: - CI ran `npm run pagefind` unconditionally. pagefind exits 0 even when it indexes nothing (e.g. the output dir moves or no indexable pages are emitted), so a broken index would deploy with silently empty search. Tee the output and fail the build unless at least 10 pages were indexed. - Bump the pagefind floor from ^1.3.0 to ^1.5.0. The docs-ui search markup uses the Component UI web components (pagefind-modal, pagefind-modal-trigger), which were introduced in Pagefind 1.5.0; ^1.3.0 declared compatibility with releases that lack them. The lockfile already resolved 1.5.2, so this only corrects the declared range (no dependency change). Signed-off-by: Thomas Müller <1005065+DeepDiver1975@users.noreply.github.com>
|
Addressed the review findings in
Retracting the |
DeepDiver1975
left a comment
There was a problem hiding this comment.
🤖 Automated review by Claude Code review agent.
Overview
This PR migrates documentation search from a self-hosted Elasticsearch index to Pagefind, a static search library that crawls the built HTML in public/ and emits a chunked static index into public/pagefind/. The change is clean and well-scoped: it removes the 192-line generate-index.js Antora extension, the local ES docker-compose file, the ELASTICSEARCH_* CI secrets/env, and four runtime dependencies (@elastic/elasticsearch, cheerio, html-entities, lodash), replacing them with a single pagefind dev dependency and an npm script. The docs and CI build are updated accordingly. This is a strong net reduction in operational complexity (no server, container, CORS workaround, or secrets) and a sensible architectural direction.
Code quality / style
- The CI guard around
npm run pagefindis a nice touch — failing the build when the index is empty avoids silently shipping broken search, and the inline comment explains the rationale well. - Dependency cleanup is thorough: the
package-lock.jsoncorrectly drops the entire transitive tree (cheerio/parse5/htmlparser2/undici/etc.) and the shared depsdebug/ms/iconv-lite/safer-bufferare correctly re-flagged"dev": true. - The rewritten Search section in
build-the-docs.mdis much simpler and accurate to the new workflow.
Specific suggestions
-
package-lock.jsonname changed"docs"→"orchestrator"(likely accidental). Line ~2 of the lockfile diff changes the top-level"name"field fromdocstoorchestrator, butpackage.jsonstill declares the project (it is not shown changing itsname). This looks like an artifact of regenerating the lockfile in the wrong directory / from a differentpackage.json. It should be reverted so the lockfilenamematchespackage.json, otherwisenpmmay warn and it is confusing. Please confirmpackage.json'snameand the lockfile agree. -
CI grep regex vs. Pagefind output wording. Pagefind's summary line uses the template
Indexed {N} page{s}(singular "page" for 1, "pages" for >1). The guardgrep -qE 'Indexed [1-9][0-9]+ pages'requires a 2+-digit count immediately followed by the literalpages, i.e. it passes only at >=10 pages — which matches the intent. Two caveats worth a quick check:- If Pagefind ever emits the count with a thousands separator (e.g.
1,359) or with ANSI color codes around the number, thegrepwould fail to match a perfectly good index and break the build. Consider stripping ANSI (sed 's/\x1b\[[0-9;]*m//g') before grepping, or loosening the pattern. Given the PR reports ~1359 pages, a separator-free large integer is the current reality, but it is a brittle coupling to log formatting. - The error message says "fewer than 10 pages" while the comment above says the concern is indexing nothing; consider aligning the wording so the threshold intent is unambiguous.
- If Pagefind ever emits the count with a thousands separator (e.g.
-
pagefind.loghandling.npm run pagefind | tee pagefind.logthenrm -f pagefind.log. Because of the&& / || {}block andset -esemantics in arun:step, thermonly executes on the success path; on failure the scriptexit 1s first, leavingpagefind.logbehind. That is harmless (ephemeral CI runner) but means the cleanup is effectively only cosmetic. Minor. -
npm run pagefindscript.pagefind --site publicis correct. Note newer Pagefind versions prefer--site(kept) over the deprecated--source; good.
Potential issues / risks
-
End-to-end search depends on
docs-ui. The PR body correctly notes that loading Pagefind's UI bundle lives in the separatedocs-uirepo and needs a matching change. Until that lands, the deployed site will ship a validpublic/pagefind/index but the search bar will not be wired to it — i.e. search is effectively non-functional end-to-end after this merges alone. Recommend coordinating the merge with thedocs-uichange (or noting the rollout order in the PR) to avoid a window of broken search in production. -
Removed functionality — per-heading anchored results. The old indexer captured each
h1..h6with its anchor (url#id) so results could deep-link to a specific section. Pagefind supports sub-results/anchored headings too, but only if the UI is configured for it. Worth confirming thedocs-uiintegration preserves the heading-level deep-linking that users currently get, otherwise this is a subtle UX regression. -
Excluded pages. The old extension explicitly skipped pages with a leading
_(nopage.pub.url). Pagefind indexes whatever HTML Antora actually publishes, so behavior should be equivalent (unpublished partials are not emitted topublic/), but worth a sanity check that no internal/partial pages leak into the index. -
No automated test of the index. Beyond the CI page-count floor, there is no assertion that
public/pagefind/is structurally valid or that a known term returns a hit. The page-count guard is a reasonable, pragmatic smoke test for a docs repo.
Overall: a well-executed, simplifying migration. The main blocker to verify before merge is the package-lock.json name change (item 1); the docs-ui coordination (first risk) is the key functional dependency to sequence correctly.

Summary
Migrates documentation search from a self-hosted Elasticsearch index to Pagefind, a static search library. Pagefind crawls the already-built HTML in
public/and writes a chunked static index intopublic/pagefind/— no server, container, CORS workaround, orELASTICSEARCH_*secrets required.Changes
package.json— drop@elastic/elasticsearch,cheerio,html-entities,lodash; addpagefinddev dependency and apagefindnpm scriptsite.yml/docs/extensions.md— remove thegenerate-index.jsAntora extensionext-antora/generate-index.js(192-line ES indexer) andes-docker-compose.yml(local ES container).github/workflows/ci.yml— dropELASTICSEARCH_*env/secrets; runnpm run pagefindafter the Antora build so the deployed site always ships a fresh indexdocs/build-the-docs.md— rewrite the Search section for the Pagefind workflowVerification
Built and indexed locally:
npm run antorabuilds cleanly with no reference to the removed extensionnpm run pagefindindexes 1359 pages / 21916 words and writespublic/pagefind/including the search UI bundle🤖 Generated with Claude Code