"HTML document" in "HTML: the markup language"

Under "2. HTML syntax", the term "HTML documents" is taken to mean  
both text/html and application/xhtml+xml data streams. The link to the  
definition is broken. "Under 2.2. HTML documents", it says that an  
HTML document must consist of, among other things, "A DOCTYPE", which  
isn't true for application/xhtml+xml data streams. Immediately before  
this section, it says "For the most part, the remaining subsections in  
this section provide details specific to the HTML syntax."

I think various deliverables of this WG should be consistent in what  
an "HTML document" is.
  1) Does it cover both HTML and XML serializations / DOM modes?
  2) Does it cover a) byte streams, b) Unicode character streams, c)  
tree implementing certain DOM interfaces in certain modes and/or d)  
non-DOM in-memory data structures?

Note that in the XML spec, an "XML document" is primarily defined in  
terms of the textual form of the data object but in the HTML 5 draft  
an "HTML document" is primarily defined in terms of tree node  
implementing particular DOM interfaces in particular modes.

I suggest the following, which I believe best matches the way the  
terms are actually used by people:

"HTML document" should mean
  a) A byte stream labeled text/html
  b) A stream of Unicode characters that has the same textual  
interpretation as the above-mentioned byte stream
  c) A DOM tree in the HTML mode
  d) Another in-memory representation of such a tree if the tree  
carries some kind of HTMLness flag.
  e) A mathematical object that corresponds to such a concrete data  
structure.

"XHTML document" should mean
  a) A byte stream labeled application/xhtml+xml or another XML  
content type if upon parsing, the namespace of the root element would  
be in the XHTML namespace
  b) A stream of Unicode characters that has the same textual  
interpretation as the above-mentioned byte streams
  c) A DOM tree in the XML mode with the root element from the XHTML  
namespace
  d) Another in-memory representation of XML with the root element is  
in the XHTML namespace.
  e) An XML infoset whose root element is in the XHTML namespace.

I don't have a suggestion for a term that would mean both HTML  
documents and XHTML documents collectively.

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/

Received on Sunday, 16 November 2008 09:06:55 UTC