wip: add support for C14N #119

shunkica · 2025-08-25T10:54:38Z

There is still a lot of work to do here but I wanted to gauge your opinion before proceeding any further.
In particular because of the changes that I made to existing code, interaction with the existing code (particulary regarding disposables), and the bindings that were added.

Furthermore I don't really know how to get around the unsupported node types when used with NodeSet canonicalization and createNode in isVisibleCallback.

The changes needed for just the plain Document canonicalization, and Node canonicalization are minimal. Most of the changes were needed to accommodate the NodeSet canonicalization and the callback canonicalization, so it is probably worth considering if those are even needed.

jameslan · 2025-08-26T01:53:24Z

binding/exported-functions.txt

+_xmlC14NDocDumpMemory
+_xmlC14NExecute


xmlC14NExecute is the superset of all the C14N apis, we shouldn't need to export other functions:

xmlC14NDocSaveTo is NodeSet specialized version of xmlC14NExecute
xmlC14NDocSave and xmlC14NDocDumpMemory are destination specialized version of xmlC14NDocSaveTo

So on top of C implemented xmlC14NExecute, we could implement our own in javascript: a Set<XmlNode>-based visible callback etc

Yes but at the same time xmlC14NExecute requires a callback and my assumption was that doing a callback for each Node to a JS function would be much slower.
That is why xmlC14NDocDumpMemory was my first choice. That function required no other imports and very minimal code to make it work.
Additionally since not all node types are supported in createNode, I could not figure out how to even implement this callback proper.y

Don't worry about the performance. "Premature optimization is the root of all evil". There may be many factors affecting whether it is a concern:

Callback from wasm to javascript may not be that slow

Callback may not be a significant portion of entire C14NDump (i.e. other parts spend more time)

C14NDump may be only a small portion of entire execution

It may not be very often to dump only some nodes instead of all nodes (passing NULL for all nodes)

Hmm, comparing node is a problem. Besides the problem of unsupported types of node, createNode will create a new wrapper and directly comparing two wrappers doesn't make any sense.

jameslan · 2025-08-26T03:33:22Z

binding/exported-functions.txt

+_xmlC14NDocDumpMemory
+_xmlC14NExecute
+_xmlBufferCreate
+_xmlOutputBufferCreateBuffer


Use xmlOutputBufferCreateIO instead, since we are already using XmlOutputBufferHandler as well as XmlStringOutputBufferHandler.

Refer to XmlDocument.save

jameslan · 2025-08-26T03:53:01Z

src/libxml2.mts

+ * Don't reuse this buffer.
+ */
+@disposeBy(libxml2._xmlBufferFree)
+export class DisposableXmlOutputBuffer extends XmlDisposable<DisposableXmlOutputBuffer> {


Why we need to expose the XmlOutputBuffer to javascript world? Its scope is just in one function.

And its implementation is incorrect.

You can refer to other subclass of XmlDisposable. Their instantiation has to through createInstance to ensure it can be GCed at the worst case.

jameslan · 2025-08-26T04:01:17Z

src/c14n.mts

+        tempDoc = xmlNewDoc();
+        if (!tempDoc) {
+            throw new XmlError('Failed to create new document for subtree');
+        }
+
+        // Make a deep copy of the node (1 = recursive copy)
+        const copiedNode = xmlCopyNode(options.node._nodePtr, 1);
+        if (!copiedNode) {
+            throw new XmlError('Failed to copy subtree node');
+        }
+
+        // Set the copied node as the root element of the new document
+        xmlDocSetRootElement(tempDoc, copiedNode);


I don't get it, why there need to copy the node before serializing it?

The easiest way I found to canonicalize a subtree of an element, which is the most common way C14N is used, is to make the element a root element of a document and just canonicalize that document. However later I found the xmlDOMWrapCloneNode function which seems to be the correct one to use here instead of xmlCopyNode.
The caveat is that when exclusive canonicalization with inclusive namespace prefixes is used, you must manually walk to the root of the original document and copy any inclusive namespaces you encounter to the cloned document.

To do the same thing with xmlC14NExecute you would need to create a isNodeInSubtree callback, where each node would have to walk to root, and check if any of the parent nodes is the node X we are canonicalizing. Aditionally there is some special handling of namespace nodes too that would need to be done (see the static function xmlC14NIsNodeInNodeset)

My first thought was to implement this direcly in C, eg something like:

static int isNodeInSubtree(void *user_data, xmlNodePtr node, xmlNodePtr parent) { if (node == NULL || user_data == NULL) return 0; xmlNodePtr target = (xmlNodePtr) user_data; xmlNodePtr cur = node; while (cur != NULL) { if (cur == target) return 1; cur = cur->parent; } return 0; }

But I could not find a way to "inject" this function without directly modifying the libxml2 code, which would not be very maintainable.

libxml2-wasm is just a wrapper of libxml2, IMO it needn't to add too much functionality unless it is javascript platform specific.

The original library provides only two flavors to define which node to be included: a pre-defined node set, or a callback function. None of them can naturally support the requirement of "canonicalize a subtree of an element" standalone.

So for libxml2-wasm, the API for C14N could just follow this pattern, except that,

the node set is in Set<XmlNode> type for javascript users

the implementation is calling xmlC14NExecute.

The implementation is like,

convert Set<XmlNode> to Set<number>

call xmlC14NExecute with a callback looking up in the Set<number>

Don't worry about performance - the performance of C impl is not very good either: it performs a linear lookup in the NodeSet

For C14N a subtree, it either

Open an issue on project libxml2, to add such a functionality

Up to caller to do either

ancestor search

extract the subtree

Libxml2-wasm supports ancestor search right now, may need to expose the copy functionality.

wip: add support for C14N

0a262d7

jameslan reviewed Aug 26, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

wip: add support for C14N #119

wip: add support for C14N #119

Uh oh!

shunkica commented Aug 25, 2025

Uh oh!

jameslan Aug 26, 2025

Uh oh!

shunkica Aug 26, 2025

Uh oh!

jameslan Aug 27, 2025

Uh oh!

jameslan Aug 26, 2025

Uh oh!

jameslan Aug 26, 2025

Uh oh!

jameslan Aug 26, 2025

Uh oh!

shunkica Aug 27, 2025 •

edited

Loading

Uh oh!

jameslan Aug 30, 2025

Uh oh!

jameslan Aug 30, 2025 •

edited

Loading

Uh oh!

Uh oh!

		_xmlC14NDocDumpMemory
		_xmlC14NExecute

wip: add support for C14N #119

Are you sure you want to change the base?

wip: add support for C14N #119

Uh oh!

Conversation

shunkica commented Aug 25, 2025

Uh oh!

jameslan Aug 26, 2025

Choose a reason for hiding this comment

Uh oh!

shunkica Aug 26, 2025

Choose a reason for hiding this comment

Uh oh!

jameslan Aug 27, 2025

Choose a reason for hiding this comment

Uh oh!

jameslan Aug 26, 2025

Choose a reason for hiding this comment

Uh oh!

jameslan Aug 26, 2025

Choose a reason for hiding this comment

Uh oh!

jameslan Aug 26, 2025

Choose a reason for hiding this comment

Uh oh!

shunkica Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jameslan Aug 30, 2025

Choose a reason for hiding this comment

Uh oh!

jameslan Aug 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

shunkica Aug 27, 2025 •

edited

Loading

jameslan Aug 30, 2025 •

edited

Loading