Re: edge issues with DOM, text/html, and xml serializations [was Re: handling fallback content for still images] from Robert Burns on 2007-07-09 (public-html@w3.org from July 2007)

From: Robert Burns <rob@robburns.com>
Date: Mon, 9 Jul 2007 07:17:20 -0500
To: James Graham <jg307@cam.ac.uk>
Cc: public-html@w3.org
Message-Id: <33266A45-C2CC-4076-AEF7-1EBC075DAE47@robburns.com>
On Jul 9, 2007, at 6:46 AM, James Graham wrote:
>
> What do you think needs to be said? Do we just need to warn authors  
> that certain elements are implied in text/html but not in XML?

Well that's what we've been discussing. I think we were trying to  
discuss the possibilities. We were mainly discussing UAs first, since  
to me UA guidance needs to be settled before author guidance. And  
again, I'm not trying to insult UA makers here (I feel like that may  
be some source of the strange defensiveness, but I'm working on a UA  
myself). However, the UA guidance isn't necessarily a settled issue.  
As Maciej made clear WebKit is still fixing this stuff by emulating  
Opera who also just made recent changes to how it handled <tbody>..  
Despite some confusion on these issues, there isn't a single right  
way to do these things and the sooner we can acknowledge that the  
easier our task will be.

>> Again, I don't think the spec has anything to say on this (though  
>> perhaps I missed it). So we've been discussing it to figure out if  
>> it should say something  about this. This is not our way of saying  
>> we don't appreciate the hard work that the WhatWG has put into  
>> this. Its an impressive document by any measure. However, I can't  
>> imagine any document that couldn't use some improvement.
>
> Of course it needs improvement. That's why we're here.
>
>>> It would be really useful if, any time you want to talk about the  
>>> parsing-behavior of current UAs, you could post the source of  
>>> some example input and DOM produced from that input.
>> Several posts in this discussion included source samples and  
>> discussed the results. Many of us have DOM viewers built into our  
>> browsers.
>
> So do I. The point is it helps to make sure everyone is on the same  
> page if we have a testcase in a form where anyone reading the  
> message is sure of a) what, exactly, has been tested and b) what  
> the results are. The Live DOM Viewer makes this easy.

However, didn't you say that the live DOM viewer is for text/html.  
This entire thread has mostly been focussed on the xml serialization,  
so the live DOM viewer wouldn't work for this thread. We all  
understand the way the text/html is processed, however, there have  
been some surprises on the XML side (for example Safari's processing  
XML in the same way as text/html and inserting an implied <tbody>  
into the DOM>). Maciej also said the Opera and (eventually new)  
WebKit way of processing this will be to insert an anonymous tbody.  
CSS has anonymous boxes. However, it doesn't have an anonymous tbody  
box. Either Maciej is confusing these two things or there's a new  
concept being introduced here: a CSS inferred tbody box (to coin a  
phrase).

So can we please stop pretending that the UA issues are all solved  
and we merely have to get these silly misguided authors in line.i  
It's not a fruitful way to approach this. We end up with so much more  
email volume than we would other wise need. I'm not asking commercial  
developers to say "look everybody our product is crap." Obviously  
there aren't any crap products here and so there's no reason for this  
dodginess. But our draft is going to need to address some of the  
ambiguity surrounding these issues. Its because there isn't a single  
right way to do this that the ambiguity needs to be addressed.

> For example:
>
> http://software.hixie.ch/utilities/js/live-dom-viewer/?%3C%21DOCTYPE 
> %20html%3E%0D%0A%3Ctable%3E%0D%0A%3Ctr%3E%3Ctd%3E%3C/td%3E%3C/tr%3E% 
> 0D%0A%3C/table%3E will show you the behavior of the input:
>
> <!DOCTYPE html>
> <table>
> <tr><td></td></tr>
> </table>
>
> In any browser. In Opera 9.21 and and Firefox 2.0.0.4 (these are  
> the only browsers I have access to right now) a <tbody> element is  
> placed in the DOM so the resulting tree is, modulo some minor  
> differences (which are not important to the issue at hand), and  
> with the whitespace text nodes removed:
> # DOCTYPE: html
> # HTML
>     * HEAD
>     * BODY
>           o TABLE
>                 + TBODY
>                       # TR
>                             * TD
>
> It probably isn't actually necessary to always give the full DOM  
> tree as long as it's obvious exactly what you tested, how to  
> reproduce the test and what the results are.
>
>>> It would also be helpful if you could compare this behavior with  
>>> that in the current spec; the html5lib parsetree viewer provides  
>>> a simple[1] way to do this.
>> Where do we find the html5lib parsetree viewer?
>
> http://james.html5.org/parsetree.html
>
> The output for the example above is (again removing whitespace text  
> nodes):
> |DOCTYPE: html
> |html
>   |head
>   |body
>     |table
>       |tbody
>         |tr
>           |td
>
> You can see all the output returned at http://james.html5.org/cgi- 
> bin/parsetree/parsetree.py?source=%3C%21DOCTYPE+html%3E%0D%0A% 
> 3Ctable%3E%0D%0A%3Ctr%3E%3Ctd%3E%3C%2Ftd%3E%3C%2Ftr%3E%0D%0A%3C% 
> 2Ftable%3E


I asked this before, but I'll try again. Can I use this for XML  
serialized and delivered documents?

Take care,
Rob
Received on Monday, 9 July 2007 12:17:41 UTC