Skip to content

Conversation

shunkica
Copy link

There is still a lot of work to do here but I wanted to gauge your opinion before proceeding any further.
In particular because of the changes that I made to existing code, interaction with the existing code (particulary regarding disposables), and the bindings that were added.

Furthermore I don't really know how to get around the unsupported node types when used with NodeSet canonicalization and createNode in isVisibleCallback.

The changes needed for just the plain Document canonicalization, and Node canonicalization are minimal. Most of the changes were needed to accommodate the NodeSet canonicalization and the callback canonicalization, so it is probably worth considering if those are even needed.

Comment on lines +79 to +80
_xmlC14NDocDumpMemory
_xmlC14NExecute
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

xmlC14NExecute is the superset of all the C14N apis, we shouldn't need to export other functions:

xmlC14NDocSaveTo is NodeSet specialized version of xmlC14NExecute
xmlC14NDocSave and xmlC14NDocDumpMemory are destination specialized version of xmlC14NDocSaveTo

So on top of C implemented xmlC14NExecute, we could implement our own in javascript: a Set<XmlNode>-based visible callback etc

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes but at the same time xmlC14NExecute requires a callback and my assumption was that doing a callback for each Node to a JS function would be much slower.
That is why xmlC14NDocDumpMemory was my first choice. That function required no other imports and very minimal code to make it work.
Additionally since not all node types are supported in createNode, I could not figure out how to even implement this callback proper.y

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't worry about the performance. "Premature optimization is the root of all evil". There may be many factors affecting whether it is a concern:

  • Callback from wasm to javascript may not be that slow
  • Callback may not be a significant portion of entire C14NDump (i.e. other parts spend more time)
  • C14NDump may be only a small portion of entire execution
  • It may not be very often to dump only some nodes instead of all nodes (passing NULL for all nodes)

Hmm, comparing node is a problem. Besides the problem of unsupported types of node, createNode will create a new wrapper and directly comparing two wrappers doesn't make any sense.

_xmlC14NDocDumpMemory
_xmlC14NExecute
_xmlBufferCreate
_xmlOutputBufferCreateBuffer
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use xmlOutputBufferCreateIO instead, since we are already using XmlOutputBufferHandler as well as XmlStringOutputBufferHandler.

Refer to XmlDocument.save

* Don't reuse this buffer.
*/
@disposeBy(libxml2._xmlBufferFree)
export class DisposableXmlOutputBuffer extends XmlDisposable<DisposableXmlOutputBuffer> {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we need to expose the XmlOutputBuffer to javascript world? Its scope is just in one function.

And its implementation is incorrect.

You can refer to other subclass of XmlDisposable. Their instantiation has to through createInstance to ensure it can be GCed at the worst case.

Comment on lines +111 to +123
tempDoc = xmlNewDoc();
if (!tempDoc) {
throw new XmlError('Failed to create new document for subtree');
}

// Make a deep copy of the node (1 = recursive copy)
const copiedNode = xmlCopyNode(options.node._nodePtr, 1);
if (!copiedNode) {
throw new XmlError('Failed to copy subtree node');
}

// Set the copied node as the root element of the new document
xmlDocSetRootElement(tempDoc, copiedNode);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get it, why there need to copy the node before serializing it?

Copy link
Author

@shunkica shunkica Aug 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The easiest way I found to canonicalize a subtree of an element, which is the most common way C14N is used, is to make the element a root element of a document and just canonicalize that document. However later I found the xmlDOMWrapCloneNode function which seems to be the correct one to use here instead of xmlCopyNode.
The caveat is that when exclusive canonicalization with inclusive namespace prefixes is used, you must manually walk to the root of the original document and copy any inclusive namespaces you encounter to the cloned document.

To do the same thing with xmlC14NExecute you would need to create a isNodeInSubtree callback, where each node would have to walk to root, and check if any of the parent nodes is the node X we are canonicalizing. Aditionally there is some special handling of namespace nodes too that would need to be done (see the static function xmlC14NIsNodeInNodeset)

My first thought was to implement this direcly in C, eg something like:

static int
isNodeInSubtree(void *user_data, xmlNodePtr node, xmlNodePtr parent) {
    if (node == NULL || user_data == NULL) return 0;

    xmlNodePtr target = (xmlNodePtr) user_data;
    xmlNodePtr cur = node;

    while (cur != NULL) {
        if (cur == target) return 1;
        cur = cur->parent;
    }
    return 0;
}

But I could not find a way to "inject" this function without directly modifying the libxml2 code, which would not be very maintainable.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

libxml2-wasm is just a wrapper of libxml2, IMO it needn't to add too much functionality unless it is javascript platform specific.

The original library provides only two flavors to define which node to be included: a pre-defined node set, or a callback function. None of them can naturally support the requirement of "canonicalize a subtree of an element" standalone.

So for libxml2-wasm, the API for C14N could just follow this pattern, except that,

  • the node set is in Set<XmlNode> type for javascript users
  • the implementation is calling xmlC14NExecute.

The implementation is like,

  • convert Set<XmlNode> to Set<number>
  • call xmlC14NExecute with a callback looking up in the Set<number>

Don't worry about performance - the performance of C impl is not very good either: it performs a linear lookup in the NodeSet

Copy link
Owner

@jameslan jameslan Aug 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For C14N a subtree, it either

  • Open an issue on project libxml2, to add such a functionality
  • Up to caller to do either
    • ancestor search
    • extract the subtree

Libxml2-wasm supports ancestor search right now, may need to expose the copy functionality.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants