-
-
Notifications
You must be signed in to change notification settings - Fork 587
Remove extra blank lines between list items in markdown export #1885
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@bgreenlee is attempting to deploy a commit to the TypeCell Team on Vercel. A member of the Team first needs to authorize it. |
@blocknote/ariakit
@blocknote/code-block
@blocknote/core
@blocknote/mantine
@blocknote/react
@blocknote/server-util
@blocknote/shadcn
@blocknote/xl-ai
@blocknote/xl-docx-exporter
@blocknote/xl-email-exporter
@blocknote/xl-multi-column
@blocknote/xl-odt-exporter
@blocknote/xl-pdf-exporter
commit: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that this solution is sufficient enough, you'll see that in your test cases it also strips the new line between lists (e.g. in markdown/lists/basic.md
the bullet list to numbered list boundary).
I think there are two solutions here:
- Make the markdown parser smarter (we have a somewhat unusual structure of
<li><p>Text content
which I'm sure is why the parser is tripping up) - Simplify the the HTML output so that the markdown parser doesn't have to be so smart (by removing that inner paragraph element & unpacking it's children into the
<li>
element)
I took a stab at the second approach and got something that worked better:
diff --git i/packages/core/src/api/exporters/html/util/serializeBlocksExternalHTML.ts w/packages/core/src/api/exporters/html/util/serializeBlocksExternalHTML.ts
index f74757c8d..9dc25da9a 100644
--- i/packages/core/src/api/exporters/html/util/serializeBlocksExternalHTML.ts
+++ w/packages/core/src/api/exporters/html/util/serializeBlocksExternalHTML.ts
@@ -172,6 +172,14 @@ function serializeBlock<
fragment.append(list);
}
const li = doc.createElement("li");
+ //unpack p nodes into their children
+ const childNodes = Array.from(elementFragment.childNodes);
+ for (const child of childNodes) {
+ if (child.nodeName === "P") {
+ const children = Array.from(child.childNodes);
+ child.replaceWith(...children);
+ }
+ }
li.append(elementFragment);
fragment.lastChild!.appendChild(li);
} else {
diff --git i/packages/core/src/api/exporters/markdown/markdownExporter.ts w/packages/core/src/api/exporters/markdown/markdownExporter.ts
index 812f20ffb..ce82f56f9 100644
--- i/packages/core/src/api/exporters/markdown/markdownExporter.ts
+++ w/packages/core/src/api/exporters/markdown/markdownExporter.ts
@@ -25,6 +25,8 @@ export function cleanHTMLToMarkdown(cleanHTMLString: string) {
);
}
+ console.log("cleanHTMLString", cleanHTMLString);
+
const markdownString = deps.unified
.unified()
.use(deps.rehypeParse.default, { fragment: true })
@@ -37,10 +39,7 @@ export function cleanHTMLToMarkdown(cleanHTMLString: string) {
})
.processSync(cleanHTMLString);
- let result = markdownString.value as string;
-
- // Remove extra blank lines between list items
- result = result.replace(/\n\n(?=\d+\.|-|\*)/g, "\n");
+ const result = markdownString.value as string;
return result;
}
Before going with this solution though, we'd need to double check that things like background colors still work for the HTML output, I don't remember at this moment whether they are set on the paragraph or higher. But, if we can be sure that the output isn't affected by this, then unpacking paragraphs may end up being the better solution since it is more HTML-like & easier to parse
Make sense. Let me know if I can help. |
Fixes #1881