- From: Henri Sivonen <hsivonen@iki.fi>
- Date: Mon, 9 Jul 2007 11:04:32 +0300
- To: Robert Burns <rob@robburns.com>
- Cc: Jon Barnett <jonbarnett@gmail.com>, "Thomas Broyer" <t.broyer@gmail.com>, public-html@w3.org
On Jul 9, 2007, at 10:15, Robert Burns wrote: > I sense that some of the confusion (mine and others) in this > thread may be over what an HTML5 parser is. An HTML5 parser is a piece of software that implements the section of the spec titled "Parsing HTML documents". http://www.w3.org/html/wg/html5/#parsing > As we're defining HTML5 to accept two different serializations, I > thought an HTML5 parser would be an parser capable of parsing HTML5 > whether it was from the xml serialization and delivered as > application/xml, text/xml, application/xhtml+xml (and several other > MIME types) or the classic serialization and delivered as text/html. No, this is not so. Requirements for labeling (normative for markup producers): http://www.w3.org/html/wg/html5/#xhtml5 Informative section that makes this clearer: http://www.w3.org/html/wg/html5/#html-vs Requirement for markup consumers: http://www.w3.org/html/wg/html5/#parsing If a stream of bytes is delivered as text/html, the stream of bytes must be parsed using an HTML5 parser. If a stream of bytes is delivered as application/xhtml+xml, the byte stream must be parsing using an XML parser. > However, this comment seems to indicate that an HTML5 parser only > parses the classic serialization. Is that how you understand it? Yes, it is. (And it isn't only my understanding, either.) > So there won't be an HTML5 parser that's capable of parsing the xml > serialization. Is that right? Right. (Well, not capable of parsing an arbitrary XHTML5 document into the same tree as an XML parser. An HTML5 parser will parse any input stream of bytes into *something*.) > Even if that is correct, I think it just moves our problem to other > than the parser (which I'm not sure anyone was even saying it had > to be about the parser). We will still have HTML5 UAs that will > build a tree with a <tr> as a child of a <table>. We still may need > to deal with conversions between HTML5 serializations. So far the attitude has been that it is more important to let authors omit <tbody> in conforming XHTML5 than to have perfect round-tripping of conforming documents. It is obvious that we won't have perfect round-tripping of some non-conforming documents. > Even, if we go this Safari/Opera route and recommend an anonymous > tbody element for CSS purposes, there will still be a difference > for DOM purposes. That is we still need to think about how we move > between and among HTML5 xml, HTML5 texts/html and HTML5 DOM > (relating to tbody, col / colgroup, body and head). Currently it is the case that: * When parsing non-conforming documents, the HTML5 parsing algorithm can produce DOM trees that are not serializable as XML. It has to be this way for backwards compatibility. * When parsing *conforming* documents, the HTML5 parsing algorithm can produce DOM trees that are not serializable as XML. Since we get to define conformance, it does not have to be this way for backwards compatibility, but so far certain restrictions of XML 1.0 have been seen as onerous to inflict upon authors who use the text/html serialization. * Parsing the XML serialization or modifying a DOM tree by scripting can lead to tree shapes that when serialized as text/html and parsed back result in a different tree. > Well if someone: > 1) begins to build a table in the DOM and builds one without an > explicit tbody, > 2) then serializes to text/html > 3) then the table will have no tbody in the text/html > serialization, right? > > Upon de-serialization, the DOM will have a tbody though. Correct. > Or do we want something different? Probably not. This all seems awfully inelegant, but addressing this "problem" would likely be awfully annoying in practice in cases where the presence of absence of tbody doesn't really matter. Of course, all this goes against the principle of least surprise, but that's a mistake we inherit from the HTML 4 era. We can't fix it without breaking backwards compatibility in some way. -- Henri Sivonen hsivonen@iki.fi http://hsivonen.iki.fi/
Received on Monday, 9 July 2007 08:04:58 UTC