Re: edge issues with DOM, text/html, and xml serializations [was Re: handling fallback content for still images] from James Graham on 2007-07-09 (public-html@w3.org from July 2007)

From: James Graham <jg307@cam.ac.uk>
Date: Mon, 09 Jul 2007 12:46:10 +0100
To: Robert Burns <rob@robburns.com>
CC: public-html@w3.org
Message-ID: <46922002.9060208@cam.ac.uk>
Robert Burns wrote:
> 
> 
> On Jul 9, 2007, at 4:41 AM, James Graham wrote:
> 
>> Sorry, I'm really confused.
>>
>> It would be really helpful if you could summarize the issues that you 
>> are trying to address in this thread; I really seem to be lost amongst 
>> the various discussions and subthreads.
> 
> The most recent part of this thread was a discussion about how 
> conversions and authoring should be handled relative to the edge-case 
> differences with the HTML5 DOM, HTML5's text/html serialization and 
> HTML5's XML serialization. This thread forked from the discussion of the 
> need to deal with the fact that an XML serialization could potentially 
> include <img>fallback</img>, and what we should do about that errant code.

OK, that seems like an issue that should be addressed. Assuming we want children 
of <img> elements in XML to be hidden, I assume the rendering section will cover 
it. I also see that there may be some merit in using this content as fallback in 
non-visual browsers although I have no idea what the compatibility story on this 
is. I also think it's a bit of a distraction since it doesn't address the 99% 
case of text/html. Indeed, I would regard this as sufficient reason to stop 
considering it as a fallback mechanism - since we will have to come up with a 
mechanism that works in text/html as well, there is no point in having multiple 
ways of doing the same thing.

> It then turned toward non-errant edge-case issues of implied <tbody>, 
> and implied <colgroup>. (and perhaps even implied <body> and <head>, but 
> perhaps not of practical concern). Overall, I'd say the thread has 
> turned toward discussing what (if anything) the draft should say (or 
> does say) about these issues. It isn't about hat we need a magic bullet 
> that solves these issues. Rather about how we should deal with them (if 
> we should deal with them at all).

What do you think needs to be said? Do we just need to warn authors that certain 
elements are implied in text/html but not in XML?

> Again, I don't think the spec has anything to say on this (though 
> perhaps I missed it). So we've been discussing it to figure out if it 
> should say something  about this. This is not our way of saying we don't 
> appreciate the hard work that the WhatWG has put into this. Its an 
> impressive document by any measure. However, I can't imagine any 
> document that couldn't use some improvement.

Of course it needs improvement. That's why we're here.

>> It would be really useful if, any time you want to talk about the 
>> parsing-behavior of current UAs, you could post the source of some 
>> example input and DOM produced from that input.
> 
> Several posts in this discussion included source samples and discussed 
> the results. Many of us have DOM viewers built into our browsers.

So do I. The point is it helps to make sure everyone is on the same page if we 
have a testcase in a form where anyone reading the message is sure of a) what, 
exactly, has been tested and b) what the results are. The Live DOM Viewer makes 
this easy.

For example:

http://software.hixie.ch/utilities/js/live-dom-viewer/?%3C%21DOCTYPE%20html%3E%0D%0A%3Ctable%3E%0D%0A%3Ctr%3E%3Ctd%3E%3C/td%3E%3C/tr%3E%0D%0A%3C/table%3E 
will show you the behavior of the input:

<!DOCTYPE html>
<table>
<tr><td></td></tr>
</table>

In any browser. In Opera 9.21 and and Firefox 2.0.0.4 (these are the only 
browsers I have access to right now) a <tbody> element is placed in the DOM so 
the resulting tree is, modulo some minor differences (which are not important to 
the issue at hand), and with the whitespace text nodes removed:
# DOCTYPE: html
# HTML
     * HEAD
     * BODY
           o TABLE
                 + TBODY
                       # TR
                             * TD

It probably isn't actually necessary to always give the full DOM tree as long as 
it's obvious exactly what you tested, how to reproduce the test and what the 
results are.

>> It would also be helpful if you could compare this behavior with that 
>> in the current spec; the html5lib parsetree viewer provides a 
>> simple[1] way to do this.
> 
> Where do we find the html5lib parsetree viewer?

http://james.html5.org/parsetree.html

The output for the example above is (again removing whitespace text nodes):
|DOCTYPE: html
|html
   |head
   |body
     |table
       |tbody
         |tr
           |td

You can see all the output returned at 
http://james.html5.org/cgi-bin/parsetree/parsetree.py?source=%3C%21DOCTYPE+html%3E%0D%0A%3Ctable%3E%0D%0A%3Ctr%3E%3Ctd%3E%3C%2Ftd%3E%3C%2Ftr%3E%0D%0A%3C%2Ftable%3E
-- 
"Eternity's a terrible thought. I mean, where's it all going to end?"
  -- Tom Stoppard, Rosencrantz and Guildenstern are Dead
Received on Monday, 9 July 2007 11:46:18 UTC