[whatwg] XHTML5 DOM building and IDness

On Thu, 2 Nov 2006, Henri Sivonen wrote:
> The spec says:
> > The rules for parsing XML documents (and thus XHTML documents) into DOM
> > trees are covered by the XML and Namespaces in XML specifications, and are
> > out of scope of this specification.
> 
> However, the spec says the following about the id attribute:
>
> > If the value is not the empty string, user agents must associate the element
> > with the given value (exactly) for the purposes of ID matching (e.g. for
> > selectors in CSS or for the getElementById()  method in the DOM).
>
> [...] there is a piece of code somewhere between the XML processor and 
> the resulting DOM tree that is analogous to an xml:id processor and that 
> assigns IDness to attributes that are not in a namespace, have the local 
> name "id" and belong to elements in the XHTML namespace.

Right, that piece of code is the XHTML UA. Is that a problem? Why would 
the rules resulting from HTML element semantics have to be dealt with by 
the lower level layers?


> The second quote implies that the first quote is not the full story and 
> building a DOM tree from an XHTML document byte stream is not entirely 
> covered by the XML and Namespaces in XML specifications [...]

"Not entirely" is a polite way of putting it. There's a huge gaping whole 
between the XML spec and the DOM spec, with no actual definition anywhere 
that says how you get from one to the other -- there's no equivalent of 
the HTML parser spec for XML/DOM. It's only because for most things 
there's an "obvious" mapping that the implementations are interoperable, 
IMHO. This is one reason why I've punted on defining document.write() for 
XML -- without a strict parser spec that defines at which stage the DOM is 
updated, there's no clear definition of how you insert things into the 
parser's input stream, for example.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Received on Thursday, 14 June 2007 16:51:11 UTC