Re: handling fallback content for still images from Henri Sivonen on 2007-07-09 (public-html@w3.org from July 2007)

From: Henri Sivonen <hsivonen@iki.fi>
Date: Mon, 9 Jul 2007 11:04:32 +0300
To: Robert Burns <rob@robburns.com>
Cc: Jon Barnett <jonbarnett@gmail.com>, "Thomas Broyer" <t.broyer@gmail.com>, public-html@w3.org
Message-Id: <05FFFAC3-F914-451A-B2A7-BBEAC81A2537@iki.fi>
On Jul 9, 2007, at 10:15, Robert Burns wrote:

> I sense that some of the confusion (mine and others)  in this  
> thread may be over what an HTML5 parser is.

An HTML5 parser is a piece of software that implements the section of  
the spec titled "Parsing HTML documents".
http://www.w3.org/html/wg/html5/#parsing

> As we're defining HTML5 to accept two different serializations, I  
> thought an HTML5 parser would be an parser capable of parsing HTML5  
> whether it was from the xml serialization and delivered as  
> application/xml, text/xml, application/xhtml+xml (and several other  
> MIME types) or the classic serialization and delivered as text/html.

No, this is not so.

Requirements for labeling (normative for markup producers):
http://www.w3.org/html/wg/html5/#xhtml5

Informative section that makes this clearer:
http://www.w3.org/html/wg/html5/#html-vs

Requirement for markup consumers:
http://www.w3.org/html/wg/html5/#parsing

If a stream of bytes is delivered as text/html, the stream of bytes  
must be parsed using an HTML5 parser. If a stream of bytes is  
delivered as application/xhtml+xml, the byte stream must be parsing  
using an XML parser.

> However, this comment seems to indicate that an HTML5 parser only  
> parses the classic serialization. Is that how you understand it?

Yes, it is. (And it isn't only my understanding, either.)

> So there won't be an HTML5 parser that's capable of parsing the xml  
> serialization. Is that right?

Right. (Well, not capable of parsing an arbitrary XHTML5 document  
into the same tree as an XML parser. An HTML5 parser will parse any  
input stream of bytes into *something*.)

> Even if that is correct, I think it just moves our problem to other  
> than the parser (which I'm not sure anyone was even saying it had  
> to be about the parser). We will still have HTML5 UAs that will  
> build a tree with a <tr> as a child of a <table>. We still may need  
> to deal with conversions between HTML5 serializations.

So far the attitude has been that it is more important to let authors  
omit <tbody> in conforming XHTML5 than to have perfect round-tripping  
of conforming documents. It is obvious that we won't have perfect  
round-tripping of some non-conforming documents.

> Even, if we go this Safari/Opera route and recommend an anonymous  
> tbody element for CSS purposes, there will still be a difference  
> for DOM purposes. That is we still need to think about how we move  
> between and among HTML5 xml, HTML5 texts/html and HTML5 DOM  
> (relating to tbody, col / colgroup, body and head).

Currently it is the case that:
  * When parsing non-conforming documents, the HTML5 parsing  
algorithm can produce DOM trees that are not serializable as XML. It  
has to be this way for backwards compatibility.
  * When parsing *conforming* documents, the HTML5 parsing algorithm  
can produce DOM trees that are not serializable as XML. Since we get  
to define conformance, it does not have to be this way for backwards  
compatibility, but so far certain restrictions of XML 1.0 have been  
seen as onerous to inflict upon authors who use the text/html  
serialization.
  * Parsing the XML serialization or modifying a DOM tree by  
scripting can lead to tree shapes that when serialized as text/html  
and parsed back result in a different tree.

> Well if someone:
> 1)  begins to build a table in the DOM and builds one without an  
> explicit tbody,
> 2) then serializes to text/html
> 3) then the table will have no tbody in the text/html  
> serialization, right?
>
> Upon de-serialization, the DOM will have a tbody though.

Correct.

> Or do we want something different?

Probably not. This all seems awfully inelegant, but addressing this  
"problem" would likely be awfully annoying in practice in cases where  
the presence of absence of tbody doesn't really matter. Of course,  
all this goes against the principle of least surprise, but that's a  
mistake we inherit from the HTML 4 era. We can't fix it without  
breaking backwards compatibility in some way.

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/
Received on Monday, 9 July 2007 08:04:58 UTC