Skip to content

Preprocessing API rework #56

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
139 changes: 139 additions & 0 deletions text/preprocessing-api-rework.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,139 @@
- Start Date: 2021-07.10
- RFC PR: (leave this empty)
- Svelte Issue: (leave this empty)

# Preprocessing API rework

## Summary

Introduce a new preprocessing API which is simpler but allows for more flexibility

## Motivation

The current preprocessing API is both a little hard to grasp at first and not flexible enough to satisfy more advanced use cases. Its problems:

- Ordering is somewhat arbitrary, as it runs markup preprocessors first, then script/style. Preprocessors that want to be executed at a different point are forced to do dirty workarounds. It also lead to a PR implementing somewhat of a escape hatch for this (https://github.com/sveltejs/svelte/pull/6031)
- Script/Style preprocessors may want to remove attributes, right now it's not possible to do (unless they become a markup preprocessor and do it themselves somehow) (https://github.com/sveltejs/svelte-preprocess/issues/260, https://github.com/sveltejs/svelte/issues/5900)
- In general, the distinction between markup/script/style forces a decision on the preprocessor authors that may lock them in to a suboptimal solution

The solution for a better preprocessing API therefore should be

- easier to grasp and reason about
- execute preprocessors predictably
- provide more flexibility

## Detailed design

The preprocessor API no longer is split up into three parts. Instead of expecting an object with `{script, style, markup}` functions, it expects an object with these properties:

```typescript
{
name: 'a string', // sometimes it's good to know from a preprocessor library persepective what preprocessor you are dealing with
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
name: 'a string', // sometimes it's good to know from a preprocessor library persepective what preprocessor you are dealing with
name: 'a string', // sometimes it's good to know from a preprocessor library perspective what preprocessor you are dealing with

preprocess: function,
// possibly more options later on
}
```

The `preprocess` function to which the complete source code is handed looks like this:

```typescript
result: {
code: string,
dependencies: Array<string>
} = await svelte.preprocess(
(input: { code: string, filename: string }) => Promise<{
code: string,
dependencies?: Array<string>,
map?: any
}>
)
```

Additionally, `svelte/preprocess` exports new utility functions which essentially establish the current behavior:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having the preprocess function live in svelte/compiler but having the utility functions live in svelte/preprocess seems a bit confusing to me. I'm not sure what I'd suggest though. Having preprocess in svelte/compiler be the "old" preprocessor and preprocess in svelte/preprocessor be the "new" preprocessor would eliminate the need for different behavior based on what types of arguments were passed (and then we'd eventually get rid of the old one in Svelte 4), but that might be even more confusing.

Having the compiler and the preprocessor be separate concerns (exported from separate modules) seems appealing to me, but I'm not sure how valuable of a distinction that would be to force users to make. Almost everyone is going to be interacting with this with svelte.config.js, not by directly calling preprocess and compile.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that even though most users will interact with the whole system inside the config file, there is still merit to the idea of moving everything new into svelte/preprocess for the sake of internal organization. I like the idea.


### extractStyles

```typescript
function extractStyles(code: string): Array<{
start: number;
end: number;
location: "top" | "nested" | "unknown";
content: { text: string; start: number; end: number };
attributes: Array<{
name: string;
value: string;
start: number;
end: number;
}>;
}>;
```

extracts the style tags from the source code, each with start/end position, content and attributes. `location` would require a parser which has to make some assumptions about the code being "standard JS/HTML-syntax compliant" (in the sense of opening closing brackets match etc). Pending PR for such a parser here: https://github.com/sveltejs/svelte/pull/6611 . We could make using that parser the default, with a fallback to the old regex-approach in case of an error, in which case `location` would be `unknown`, not `top` or `nested`. Preprocessors could use this to only transpile top level scripts.

### extractScripts

Same as `extractStyles` but for scripts

### replaceInCode

```typescript
function replaceInCode(
code: string,
replacements: Array<{ code: string; start: number; end: number; map?: SourceMap }>
): { code: string; map: SourceMap };
```

Performs replacements at the specified positions. If a map is given, that map is adjusted to map whole content, not just the part that was processed. The result is the replaced code along with a merged map.

These three functions would make it possible to reimplement a script preprocessor like this:

```javascript
function transformStuff(...) { /* user provided function */ }
function getDependencies(...) { /* user provided function */ }
function script({code}) {
const scripts = extractScripts(code);
const replacements = scripts.map(transformStuff);
return {
...replaceInCode(code, replacements),
dependencies: getDependencies(replacements)
}
}
```

Using these three functions, we could also construct convenience functions like `replaceInScript` which would make it possible for preprocessor authors to do `return replaceInScript(code, transformStuff)`. What functions exactly to provide is up for discussion, the point is that we should provide primitives to ensure more flexibility for advanced use cases.

Since preprocessors are now "just" functions, there's no ordering headache anymore, preprocessors are invoked in order, giving full control for composability.

### Roadmap, backwards-compatibility

This new functionality could be implemented in Svelte 3, where the `preprocess` function checks if the passed in preprocessor is an object (current API) or a function (proposed API). In Svelte 4, this would become the default, and we could provide a function for preprocessors that don't support the new API yet.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps a clarification is in order that we'd provide a function which developers could use in their config files to wrap preprocessors that have not yet taken the necessary steps to adopt the new API.

As written, you might interpret it as "we will provide a function to preprocessor authors".

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, there needs to be more work done here to explain what we do in Svelte 3, I think.

If you pass an array of preprocessors, some of which use the new API, and some of which use the old API, then what do you do about ordering? If preprocessor A uses the old API, but B uses the new API, you've got a condition where B is expecting that A will run first. But if A uses script or style, it won't have done anything when B runs. In this case, my ugly workaround is still necessary to get PostCSS turned into proper CSS before Svelte Image does its thing.

Perhaps we still need to have developers (not preprocessor authors) signal that they want to opt in to the new API via some config setting. Then we can just force them to use the provided function to wrap old-style preprocessors (as they would have to do in Svelte 4 anyway).

So with no "experimental" flag set, we respect either style of API, but preserve the old ordering mechanism: any new-style API usage will be ordered in the same fashion as markup is today. Additionally, when we encounter the new-style API, we log a warning message to the terminal with a brief message about ordering and a link to visit for more info.

When the experimental flag set to true, we remove the old-style API entirely which fixes ordering automatically. Additionally, when we encounter the old-style API, we throw an error with an explanation that "Preprocessor {name} attempted to use the deprecated API, but you've opted in to the new style. Either wrap your preprocessor in the legacyPreprocessor function or opt out of the new behavior." Also with a link to read more.


```javascript
export function legacyPreprocessor(preprocessor) {
return async ({ code, filename }) => {
const processedBody = await (preprocessor?.markup({ content, filename }) ??
Promise.resolve({ code: content }));

// .. etc
return mergeParts(processedBody, processedScript, processedStyle);
};
}
```

## How we teach this

- Adjust docs
- Write a blog post outlining the need for the change, what new capabilities it unlocks, and a migration path

## Drawbacks

None that I can think of right now

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main drawback is that this is planned to be a breaking change in Svelte 4. But that is mitigated by providing the legacyPreprocessor function.

There are other potential drawbacks... but I cannot guess at them because I do not know the original thinking that led to doing three separate passes on preprocessors to begin with. That decision led to this current ordering problem which is addressed in this RFC, but it ostensibly was solving some other problem at the time. In theory, this change would re-introduce whatever that problem was.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I asked the other maintainers about the decision for the API back then and there was no conscious "this is good because X" reason for that. The main argument was to provide some ease of use, but that would be achieved just as well with the helper functions.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am hoping that this change is a slam dunk, then.


## Alternatives

None that I can think of right now

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ordering

  1. just a config flag: opt in to order preprocessors top to bottom with no special second or third pass on script or style.
  2. Some nesting mechanism as I created in my PR that you linked at the top. (For those who have not been following, I personally prefer this RFC solution to the PR that I originally opened)

Attr adjustments within script/style

  1. You could fix the inability to edit tag information in scripts and styles by adding another optional property in the return of either script or style that we'd then act on internally. The bit below is adapted from current Svelte docs. See the commented lines which indicate the change I mean.
type Preprocessor = {
	markup?: (input: { content: string, filename: string }) => Promise<{
		code: string,
		dependencies?: Array<string>
	}>,
	script?: (input: { content: string, markup: string, attributes: Record<string, string>, filename: string }) => Promise<{
		code: string,
		dependencies?: Array<string>
		// Add the following property
		attributes?: Record<string, string>
	}>,
	style?: (input: { content: string, markup: string, attributes: Record<string, string>, filename: string }) => Promise<{
		code: string,
		dependencies?: Array<string>
		// Add the following property
		attributes?: Record<string, string>
	}>
}
  1. We could still provide a helper function for grabbing script and style directly out of markup as described in this RFC, but without changing the underlying API at all. Preprocessor authors could more easily transition from style or script to markup using the function we provide.

As for these alternatives, this RFC feels like the better, more thorough approach to me.


## Unresolved questions

- What about preprocessors inside mustache tags? Should the Svelte parser be adjusted for an opt-in parsing mode where a Javascript-like syntax for mustache tags is assumed to extract its contents and provide this as another utility function for preprocessing? (https://github.com/sveltejs/svelte/issues/4701) - _Update_: The pending parser PR would partly adress this by giving back location infos for mustache tags as well: https://github.com/sveltejs/svelte/pull/6611
- Other preprocessor utils that are of use, for example doing the opposite of `extractScripts`/`extractStyles` and provide a `replaceMarkup` function (https://github.com/sveltejs/svelte/issues/5005)?