Design the extension entry-point ladder beyond `transform_corpus`

This issue is the home for the design discussion that came out of PR #1196's review.

## Background

PR #1196 introduces the corpus-mutation extension system with a single entry point: a script defines a `transform_corpus(corpus)` global, and the host calls it. This is the simplest possible shape (let's call it **Option 1**) and it's what shipped in that PR.

During review, Alan raised the question of whether this shape is appropriate, particularly as we add:

- More lifecycle hooks (e.g. `before_extract`, `after_render`).
- Extensions that register multiple capabilities from one file (a corpus transform and a Handlebars helper).
- Extensions that bundle non-code files (templates, assets, configuration).
- Extension enabling/disabling.

The PR ships Option 1. This issue captures the trade-off analysis and lays out the ladder of richer options we can add as needs arise.

## The four entry-point patterns

### Option 1: Reserved function names

Script defines a function with a known name (`transform_corpus`); the host calls it.

Examples: pytest, Sphinx.

Trade-off: minimal syntax. The host owns the names, so one file can expose many *different* known hooks, but a script can't add several capabilities of the same kind under names it picks itself.

My evaluation: right starting point for Mr. Docs, and it scales further than it looks. pytest recognizes dozens of `pytest_*` hooks, and a single `conftest.py` (or one Sphinx `setup()`) routinely defines many of them together. So "many capabilities in one file" is not a reason to leave this rung; the only thing it genuinely can't do is let a script name its own capabilities.

### Option 2: Top-level registration calls

Script calls `host.register_*(fn)` in top-level code; the host stores the registration and invokes the callback at the right time.

Exampled: Darktable, LLVM/Clang plugins.

Trade-off: the script passes a name, so it can add capabilities the host never pre-named, like several generators with author-chosen names from one file. The price: the host must run the script just to learn those names, then keep each registered callback alive until it calls it. (GDB's pretty-printers work this way: `register_pretty_printer` stores the callable, and GDB invokes it later, once per value.)

My evaluation: not needed yet. Reserved names already give us "many capabilities per file", so that's not the reason to climb. The real trigger is wanting scripts to *name their own* capabilities, most concretely one extension that adds several named generators. Until that's a concrete need, paying the run-at-discovery and keep-alive cost is premature.

### Option 3: Reserved `register` function + event emitter

Scripts export one reserved name (`register`); inside, it subscribes to host events.

Example: Antora.

Trade-off: single reserved name + familiar event pattern; adds an emitter abstraction layer.

My evaluation: probably not the right rung for Mr. Docs. Antora's pattern fits a pipeline with many extension points throughout the build; we have fewer. The emitter abstraction is overhead for our shape. We could skip rung 3 and jump from rung 2 straight to rung 4 if/when needed.

### Option 4: Manifest + accompanying code

An extension is a directory: a manifest file (JSON/YAML) declares the extension name and capabilities; one or more accompanying files contain the actual logic.

Examples: Claude Code skills (Markdown frontmatter + body).

Trade-off: most expressive; supports paired helpers, auxiliary files, enable/disable, configurable extensions. Requires the most infrastructure.

My evaluation: the right answer once we want enable/disable, named extensions, auxiliary files, or configurable extensions. Heaviest but most expressive. The natural top of the ladder.

## The ladder

The options aren't mutually exclusive. They form a complexity ladder:

| Rung | Pattern | What you get |
|---|---|---|
| 1 | Reserved name (Option 1) | Many fixed-name hooks per file, no ceremony, but the host owns the names |
| 2 | Registration calls (Option 2) | Script-chosen names: one extension adds several capabilities under names it picks (e.g., multiple named generators) |
| 3 | Manifest + code (Option 4) | Shared files: an extension is a directory bundling code, helpers, and assets |
| ... | ... | enable/disable, configuration schemas, ... |

PR #1196 ships rung 1. Higher rungs land as concrete use cases surface.

## Future questions to settle here

These came up in the PR review. They are not blocking PR #1196 but should inform the ladder above.

- **Paired helpers**: should one extension file expose both a corpus transform and a Handlebars helper? Reserved names already allow this (pytest and Sphinx both put many different hooks in one file), so it does *not* force rung 2; it just needs a second reserved name.
- **Auxiliary files**: should an extension be a directory with assets/templates/config, not just a script? This forces rung 3.
- **Enable/disable**: how do users opt individual extensions in or out? Likely needs a config-side knob and probably an extension name (which forces a manifest).
- **Registering generators**: should one extension add several *named* output formats (e.g., a Markdown generator)? This is the real case that needs script-chosen names, so it's what would justify rung 2 (a name-bearing `register_generator`) or a manifest that lists them, not the reserved-name rung.
- **Invariant safety**: we all seem to agree that extensions should not break invariants; but some features require breaking them. As real use cases land, this tension will need a concrete resolution (tighter allowlist, opt-in unsafe mutations, post-hoc validation, etc.).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Design the extension entry-point ladder beyond `transform_corpus` #1210

Background

The four entry-point patterns

Option 1: Reserved function names

Option 2: Top-level registration calls

Option 3: Reserved `register` function + event emitter

Option 4: Manifest + accompanying code

The ladder

Future questions to settle here

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Rung	Pattern	What you get
1	Reserved name (Option 1)	Many fixed-name hooks per file, no ceremony, but the host owns the names
2	Registration calls (Option 2)	Script-chosen names: one extension adds several capabilities under names it picks (e.g., multiple named generators)
3	Manifest + code (Option 4)	Shared files: an extension is a directory bundling code, helpers, and assets
...	...	enable/disable, configuration schemas, ...

Design the extension entry-point ladder beyond transform_corpus #1210

Description

Background

The four entry-point patterns

Option 1: Reserved function names

Option 2: Top-level registration calls

Option 3: Reserved register function + event emitter

Option 4: Manifest + accompanying code

The ladder

Future questions to settle here

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Design the extension entry-point ladder beyond `transform_corpus` #1210

Option 3: Reserved `register` function + event emitter