Re: edge issues with DOM, text/html, and xml serializations [was Re: handling fallback content for still images]

On Jul 9, 2007, at 9:34 AM, James Graham wrote:

> Robert Burns wrote:
>> Despite some confusion on these issues, there isn't a single right  
>> way to do
>> these things and the sooner we can acknowledge that the easier our  
>> task will
>> be.
> If you're talking about XML parsing there really is only one way to  
> do it; the DOM you get is determined by the XML spec. Any browser  
> that does something different has a bug.

I've been working with primarily XML for nearly a year now (CSS and  
DOM and translation). And I can tell you it's not as unambiguous as  
you might think. There's definitely ambiguity and there's room to  
clear up ambiguity. The XML spec is most clear on well-formedness.  
After that, there's wiggle room.

>>>>> It would be really useful if, any time you want to talk about  
>>>>> the parsing-behavior of current UAs, you could post the source  
>>>>> of some example input and DOM produced from that input.
>>>> Several posts in this discussion included source samples and  
>>>> discussed
>>>> the results. Many of us have DOM viewers built into our browsers.
>>> So do I. The point is it helps to make sure everyone is on the  
>>> same page if
>>> we have a testcase in a form where anyone reading the message is  
>>> sure of a)
>>> what, exactly, has been tested and b) what the results are. The  
>>> Live DOM
>>> Viewer makes this easy.
>> However, didn't you say that the live DOM viewer is for text/html.  
>> This entire thread has mostly been focussed on the xml  
>> serialization, so the live
>> DOM viewer wouldn't work for this thread.
> XML is less interesting in general because a) it's much less widely  
> used and b) the correct behavior of XML parsers is generally well  
> understood (at least compared to HTML parsers).

The conversation has been about much more than parsing. If you narrow  
it down to parsing, then yes there is less room for ambiguity (though  
still some). However, we're talking about more than just parsing  
here: so the issues are compounded. We've been discussing  
serialization, de-serialization, conversion, rendering, applying CSS,  
etc. The issues are not as straight-forward as you might think.

>  We all understand the way the
>> text/html is processed, however, there have been some surprises on  
>> the XML
>> side (for example Safari's processing XML in the same way as text/ 
>> html and
>> inserting an implied <tbody> into the DOM>). Maciej also said the  
>> Opera and
>> (eventually new) WebKit way of processing this will be to insert  
>> an anonymous
>> tbody. CSS has anonymous boxes. However, it doesn't have an  
>> anonymous tbody
>> box. Either Maciej is confusing these two things or there's a new  
>> concept
>> being introduced here: a CSS inferred tbody box (to coin a phrase).
> As far as I understand it (and I understand CSS very little), all  
> that happens is that some extra "anonymous" elements are inserted  
> in the CSS render tree. The DOM does not change at-all.

Well an update to Safari will make that so. However, its worth  
mentioning that WebKit rarely gets things both: 1) wrong per spec and  
2) different than the others. So its an interesting fact that in this  
area they did (or at least they seem to; and I'm not saying there's  
anything for the project to be ashamed about; I'm saying that there  
may be interesting thought processes going on there that are not  
worth dismissing out of hand). So it raises an interesting issue how  
that mistake came about.

>> I asked this before, but I'll try again. Can I use this for XML  
>> serialized
>> and delivered documents?
> Sadly the XML version of the Live DOM Viewer seems to be broken.  
> However if you think it is useful to know how XML parsers act I can  
> try to hook an XML parser up to the HTML5 Parse Tree Viewer.

I think its more interesting to be able to test the various XML  
engines (because they're not necessarily as uniform as you might  
think). Perhaps, I'll take a look at the xml port of Live DOM Viewer  
if I can find the time. That would be more useful.

Take care,

Received on Monday, 9 July 2007 14:48:53 UTC