-
Notifications
You must be signed in to change notification settings - Fork 6
wip: add support for C14N #119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
_xmlC14NDocDumpMemory | ||
_xmlC14NExecute |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
xmlC14NExecute
is the superset of all the C14N apis, we shouldn't need to export other functions:
xmlC14NDocSaveTo
is NodeSet specialized version of xmlC14NExecute
xmlC14NDocSave
and xmlC14NDocDumpMemory
are destination specialized version of xmlC14NDocSaveTo
So on top of C implemented xmlC14NExecute
, we could implement our own in javascript: a Set<XmlNode>
-based visible callback etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes but at the same time xmlC14NExecute
requires a callback and my assumption was that doing a callback for each Node to a JS function would be much slower.
That is why xmlC14NDocDumpMemory
was my first choice. That function required no other imports and very minimal code to make it work.
Additionally since not all node types are supported in createNode
, I could not figure out how to even implement this callback proper.y
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't worry about the performance. "Premature optimization is the root of all evil". There may be many factors affecting whether it is a concern:
- Callback from wasm to javascript may not be that slow
- Callback may not be a significant portion of entire C14NDump (i.e. other parts spend more time)
- C14NDump may be only a small portion of entire execution
- It may not be very often to dump only some nodes instead of all nodes (passing NULL for all nodes)
Hmm, comparing node is a problem. Besides the problem of unsupported types of node, createNode
will create a new wrapper and directly comparing two wrappers doesn't make any sense.
_xmlC14NDocDumpMemory | ||
_xmlC14NExecute | ||
_xmlBufferCreate | ||
_xmlOutputBufferCreateBuffer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use xmlOutputBufferCreateIO
instead, since we are already using XmlOutputBufferHandler
as well as XmlStringOutputBufferHandler
.
Refer to XmlDocument.save
* Don't reuse this buffer. | ||
*/ | ||
@disposeBy(libxml2._xmlBufferFree) | ||
export class DisposableXmlOutputBuffer extends XmlDisposable<DisposableXmlOutputBuffer> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we need to expose the XmlOutputBuffer
to javascript world? Its scope is just in one function.
And its implementation is incorrect.
You can refer to other subclass of XmlDisposable
. Their instantiation has to through createInstance
to ensure it can be GCed at the worst case.
tempDoc = xmlNewDoc(); | ||
if (!tempDoc) { | ||
throw new XmlError('Failed to create new document for subtree'); | ||
} | ||
|
||
// Make a deep copy of the node (1 = recursive copy) | ||
const copiedNode = xmlCopyNode(options.node._nodePtr, 1); | ||
if (!copiedNode) { | ||
throw new XmlError('Failed to copy subtree node'); | ||
} | ||
|
||
// Set the copied node as the root element of the new document | ||
xmlDocSetRootElement(tempDoc, copiedNode); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get it, why there need to copy the node before serializing it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The easiest way I found to canonicalize a subtree of an element, which is the most common way C14N is used, is to make the element a root element of a document and just canonicalize that document. However later I found the xmlDOMWrapCloneNode
function which seems to be the correct one to use here instead of xmlCopyNode
.
The caveat is that when exclusive canonicalization with inclusive namespace prefixes is used, you must manually walk to the root of the original document and copy any inclusive namespaces you encounter to the cloned document.
To do the same thing with xmlC14NExecute
you would need to create a isNodeInSubtree
callback, where each node would have to walk to root, and check if any of the parent nodes is the node X we are canonicalizing. Aditionally there is some special handling of namespace nodes too that would need to be done (see the static function xmlC14NIsNodeInNodeset
)
My first thought was to implement this direcly in C, eg something like:
static int
isNodeInSubtree(void *user_data, xmlNodePtr node, xmlNodePtr parent) {
if (node == NULL || user_data == NULL) return 0;
xmlNodePtr target = (xmlNodePtr) user_data;
xmlNodePtr cur = node;
while (cur != NULL) {
if (cur == target) return 1;
cur = cur->parent;
}
return 0;
}
But I could not find a way to "inject" this function without directly modifying the libxml2 code, which would not be very maintainable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
libxml2-wasm is just a wrapper of libxml2, IMO it needn't to add too much functionality unless it is javascript platform specific.
The original library provides only two flavors to define which node to be included: a pre-defined node set, or a callback function. None of them can naturally support the requirement of "canonicalize a subtree of an element" standalone.
So for libxml2-wasm, the API for C14N could just follow this pattern, except that,
- the node set is in
Set<XmlNode>
type for javascript users - the implementation is calling
xmlC14NExecute
.
The implementation is like,
- convert
Set<XmlNode>
toSet<number>
- call
xmlC14NExecute
with a callback looking up in theSet<number>
Don't worry about performance - the performance of C impl is not very good either: it performs a linear lookup in the NodeSet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For C14N a subtree, it either
- Open an issue on project libxml2, to add such a functionality
- Up to caller to do either
- ancestor search
- extract the subtree
Libxml2-wasm supports ancestor search right now, may need to expose the copy functionality.
There is still a lot of work to do here but I wanted to gauge your opinion before proceeding any further.
In particular because of the changes that I made to existing code, interaction with the existing code (particulary regarding disposables), and the bindings that were added.
Furthermore I don't really know how to get around the unsupported node types when used with NodeSet canonicalization and createNode in
isVisibleCallback
.The changes needed for just the plain Document canonicalization, and Node canonicalization are minimal. Most of the changes were needed to accommodate the NodeSet canonicalization and the callback canonicalization, so it is probably worth considering if those are even needed.