- From: Henri Sivonen <hsivonen@iki.fi>
- Date: Sun, 16 Nov 2008 11:06:13 +0200
- To: HTML WG <public-html@w3.org>
Under "2. HTML syntax", the term "HTML documents" is taken to mean both text/html and application/xhtml+xml data streams. The link to the definition is broken. "Under 2.2. HTML documents", it says that an HTML document must consist of, among other things, "A DOCTYPE", which isn't true for application/xhtml+xml data streams. Immediately before this section, it says "For the most part, the remaining subsections in this section provide details specific to the HTML syntax." I think various deliverables of this WG should be consistent in what an "HTML document" is. 1) Does it cover both HTML and XML serializations / DOM modes? 2) Does it cover a) byte streams, b) Unicode character streams, c) tree implementing certain DOM interfaces in certain modes and/or d) non-DOM in-memory data structures? Note that in the XML spec, an "XML document" is primarily defined in terms of the textual form of the data object but in the HTML 5 draft an "HTML document" is primarily defined in terms of tree node implementing particular DOM interfaces in particular modes. I suggest the following, which I believe best matches the way the terms are actually used by people: "HTML document" should mean a) A byte stream labeled text/html b) A stream of Unicode characters that has the same textual interpretation as the above-mentioned byte stream c) A DOM tree in the HTML mode d) Another in-memory representation of such a tree if the tree carries some kind of HTMLness flag. e) A mathematical object that corresponds to such a concrete data structure. "XHTML document" should mean a) A byte stream labeled application/xhtml+xml or another XML content type if upon parsing, the namespace of the root element would be in the XHTML namespace b) A stream of Unicode characters that has the same textual interpretation as the above-mentioned byte streams c) A DOM tree in the XML mode with the root element from the XHTML namespace d) Another in-memory representation of XML with the root element is in the XHTML namespace. e) An XML infoset whose root element is in the XHTML namespace. I don't have a suggestion for a term that would mean both HTML documents and XHTML documents collectively. -- Henri Sivonen hsivonen@iki.fi http://hsivonen.iki.fi/
Received on Sunday, 16 November 2008 09:06:55 UTC