Skip to content

Order of declaring namespace for attributes and using said namespace should not matter. #22

Open
@jbrayfaithlife

Description

@jbrayfaithlife

Bug Report

According to the xml (spec)[https://www.w3.org/TR/2006/REC-xml-names11-20060816/#sec-namespaces]:

The namespace prefix, unless it is xml or xmlns, must have been declared in a namespace declaration attribute in either the start-tag of the element where the prefix is used or in an ancestor element (i.e. an element in whose content the prefixed markup occurs). Furthermore, the attribute value in the innermost such declaration must not be an empty string.

Though it is admittedly harder to read, these two declarations should be both valid uses of the prefix:
<div xmlns:epub="http://www.idpf.org/2007/ops" epub:type="footnote">Test</div>
<div epub:type="footnote" xmlns:epub="http://www.idpf.org/2007/ops">Test</div>

Unfortunately, the way that the parser works, it parses attributes in the order they are declared, so the first example parses correctly to the expected namespace uri, but the second one does not.

Prerequisites

  • [/] Can you reproduce the problem in a MWE?
  • [/] Are you running the latest version of AngleSharp?
  • [/] Did you check the FAQs to see if that helps you?
  • [/] Are you reporting to the correct repository? (there are multiple AngleSharp libraries, e.g., AngleSharp.Css for CSS support)
  • [/] Did you perform a search in the issues?

For more information, see the CONTRIBUTING guide.

Description

Namespace declarations need to be parsed before other attributes on an element.

Steps to Reproduce

var document = new XmlParser().ParseDocument(@"<xml xmlns:epub=""http://www.idpf.org/2007/ops"" epub:type=""noteref"">1</xml>");
var root = document.DocumentElement;
root.Attributes.First(att => att.LocalName == "type").NamespaceUri.Dump();

document = new XmlParser().ParseDocument(@"<xml epub:type=""noteref"" xmlns:epub=""http://www.idpf.org/2007/ops"" >1</xml>");
root = document.DocumentElement;
root.Attributes.First(att => att.LocalName == "type").NamespaceUri.Dump();

Expected behavior: both Dump() calls should print out http://www.idpf.org/2007/ops.

Actual behavior: the first call to Dump() outputs the correct uri, the second outputs null.

Environment details: Win 10 .NET 6.0.15

Possible Solution

There are two approaches that could be taken, both around

for (var i = 0; i < tagToken.Attributes.Count; i++)
{
var attr = tagToken.Attributes[i];
var item = CreateAttribute(attr.Key, attr.Value.Trim());
element.AddAttribute(item);
}

First, we could make sure to process any namespace declarations before any other attributes, which seems like the simplest approach. I have a PR to this effect that I will put up for your review.

Second, we could do a second run through the created attributes, double checking the namespaces after all the attributes have been processed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions