Re: xmlns in HTML5

On Jul 17, 2009, at 17:13, Shane McCarron wrote:

> Henri Sivonen wrote:
>> What you say is true for all practical purposes when talking about  
>> "the DOM" in browsers. It's not necessarily true in practice and  
>> per spec for all kinds of things called "the DOM".
> Okay.  How about per the HTML5 spec?  If it is not, then we would  
> like it to be.  To whom do we send that comment?

Per spec, what Sam said holds when the section http://www.whatwg.org/specs/web-apps/current-work/#coercing-an-html-dom-into-an-infoset 
  is not invoked. When that section is invoked, then what Sam said  
doesn't always hold. You may safely assume that browsers won't invoke  
the section but other classes of products may.

You can send comments to public-html-comments or to public-html if you  
are a participant in the HTML WG or to the WHATWG list if you are  
subscribed to that list or you can file a bug in the W3C Bugzilla.

I don't expect http://www.whatwg.org/specs/web-apps/current-work/#coercing-an-html-dom-into-an-infoset 
  to change substantially, because the rules recounted there are  
necessary when you have an XML API that throws on certain things and  
you can't change the API. (For example, if people want to use the JDK  
DOM, the JDK DOM is what it is.)

>> The public DOM API is required to throw when setting an attribute  
>> whose namespace is null and whose local name is "xmlns:foo":
>>> NAMESPACE_ERR: Raised if the qualifiedName is malformed per the  
>>> Namespaces in XML specification, if the qualifiedName has a prefix  
>>> and the             namespaceURI is null, if the qualifiedName has  
>>> a prefix that is "xml" and the namespaceURI is different from "http://www.w3.org/XML/1998/namespace 
>>> ", if the qualifiedName or its prefix is "xmlns" and the  
>>> namespaceURI is different from "http://www.w3.org/2000/xmlns/", or  
>>> if the namespaceURI is "http://www.w3.org/2000/xmlns/" and neither  
>>> the qualifiedName nor its prefix is "xmlns".
>>
>> http://www.w3.org/TR/DOM-Level-3-Core/core.html#ID-ElSetAttrNS
> Huh?  in what way does the attribute xmlns:foo="someURI" violate the  
> constraints that would cause NAMESPACE_ERR to be thrown when calling  
> createAttribute in the DOM?

Setting an attribute with namespaceURI null and localName "xmlns:foo"  
violates both "if the qualifiedName has a prefix and the              
namespaceURI is null" and "if the qualifiedName or its prefix is  
"xmlns" and the namespaceURI is different from "http://www.w3.org/2000/xmlns/ 
". Note that the DOM doesn't really allow you to pass a local name but  
only allows you to pass a qualified name, so attempting to pass  
"foo:bar" causes "foo" to be treated as a prefix and "bar" as the  
local name per the requirements quoted above.

> I mean, I agree there is such an exception, but so what?

It's tough if you parse an HTML document into a tree backed by the JDK  
DOM and it throws on you before you get to do anything with the tree.

> We are declaring an attribute and a value for that attribute.  It is  
> not malformed per the Namespaces in XML specification, the  
> namespaceURI is not null, etc.

The namespace URI is null per the HTML5 parsing algorithm and existing  
browser behavior.

> Or are you claiming that in HTML5 the namespaceURI for the prefix  
> "xmlns" is somehow different than what is required by the XML  
> Namespaces Recommendation?

I'm claiming that when an attribute spelled "xmlns:foo" occurs in the  
source, the HTML5 parsing algorithm--consistent with legacy behavior-- 
creates an attribute whose namespace URI is null and whose local name  
is "xmlns:foo".

>> Therefore, to comply with the HTML 5 parsing spec, the parser needs  
>> a back door to the DOM. For example, if you set the html5.enable  
>> pref to true in a recent nightly build of Firefox trunk, you get a  
>> parser that uses such a back door.
> I don't follow this, sorry.

See http://hsivonen.iki.fi/testing-html5-parsing/

>> The Validator.nu HTML Parser exposes the Java DOM API outside  
>> browsers. It builds the tree using the public API, because it uses  
>> the JDK/JAXP DOM implementation and the JDK/JAXP doesn't specify a  
>> back door.
> But why do I need such a back door?  I am parsing the document...  
> the document will create a DOM.  The DOM will have the contents of  
> the document in it, including its elements and attributes.  Yay?

Because setting the attribute via the public API doesn't work, because  
the public API is required to throw.

>> To address situations like this, HTML 5 licenses the parser to drop  
>> those attributes, which is what the Validator.nu HTML Parser does  
>> by default when used with the JAXP DOM:
>>> If the XML API doesn't support attributes in no namespace that are  
>>> named "xmlns", attributes whose names start with "xmlns:", or  
>>> attributes in the XMLNS namespace, then the tool may drop such  
>>> attributes.
> Yes, I have read this text.  But since HTML5 requires normatively  
> XML Namespace support, the XML API must, perforce, support  
> attributes whose names start with "xmlns" and MUST support them in  
> the XMLNS namespace.

But the parsing algorithm, per legacy behavior, doesn't put them into  
the XMLNS namespace. It puts them in no namespace for consistency with  
existing browser behavior and DOM Level 2 specs. It's a mess  
Namespaces in XML created and that DOM Level 2 amplified. It's not an  
HTML5-created mess.

> I conclude that this constraint is never applicable.  Can you  
> describe the chain of events that would make the constraint  
> applicable using the example above?

Put an attribute spelled xmlns:foo onto an HTML tag in a text/html  
document and parse it with the Validator.nu HTML Parser using the JDK  
DOM or XOM as the tree implementation.

> More importantly, how do we remove this constraint from the spec.   
> It appears to be spurious.

Removing it from the spec doesn't alter the behavior of the XML APIs  
with which this section establishes compatibility, so just removing  
the section wouldn't accomplish anything useful.

>> http://www.whatwg.org/specs/web-apps/current-work/#coercing-an-html-dom-into-an-infoset
>>
>> For server-side use, it seems unwise to rely on things that need to  
>> be mangled in order to make off-the-shelf XML tree implementations  
>> not throw.
> Why do you think an off-the-shelf XML parser would throw an  
> exception when it encounters a namespace declaration?  Surely that  
> would be a surprise to most of our loyal viewers.

The point is that there's no XML parser is use in this scenario.  
There's an HTML parser and an XML tree API.

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/

Received on Friday, 17 July 2009 15:17:02 UTC