[Bug 14565] Chain of normative statements connecting MIME type to HTML vs. XHTML is broken or unobvious

http://www.w3.org/Bugs/Public/show_bug.cgi?id=14565

--- Comment #2 from Henri Sivonen <hsivonen@iki.fi> 2011-10-27 08:43:39 UTC ---
I'm looking for the following:

 * A normative requirement that says which parser must be used for which input
MIME types.

 * A normative terminology definition for an "XHTML document". (The spec talks
about writing and parsing them suggesting that an "XHTML document" is at least
a sequence of characters or bytes.)

 * Better clarity that the terms "HTML document" and "XML document" are
normatively defined by DOM4 / Web DOM Core.

 * Since "HTML document" and "XML document" are defined to be data structures
in DOM4 but an "XML document" is defined as textual source in the XML spec,
better clarity about terminology (possibly saying that the terms refer to both
data structures and their serializations) and an explanation how one can
"write" "HTML documents" if they are defined as data structures rather than
serializations.

 * A normative statement requiring a serialization constructed by following the
"writing HTML documents" section to be labeled as text/html and a requirement
for a serialization constructed by following the "writing XHTML documents"
section to be labeled as application/xhtml+xml.

I'd be OK with fixing this by saying the following:

User agents must parse text/html resources using according to the Parsing HTML
Documents section. (Note: The rules in that section construct a Document object
that is marked as being an HTML document.) User agents must parse
application/xhtml+xml resources and application/xml resources according to the
Parsing XHTML Documents section. (Note: The rules in that section construct a
Document object that is marked as being an XML document.)

The terms HTML document, XML document and XHTML document can refer to both data
structures and sequences of bytes. When referring to data structures, the terms
"HTML document" and "XML document" are defined in DOM4. When referring to data
structures, "XHTML document" is defined (right here) as an XML document whose
root element is in the HTML namespace.

When referring to streams of bytes an "HTML document" is a stream of bytes
labeled as text/html by out-of-band metadata. [Defining XML document left as an
exercise to the editor, because the XML spec defines XML documents as streams
that satisfy particular syntactic requirements, which leaves the problem what
to call byte stream that are labeled as application/xml but don't satisfy the
syntactic requirements.] When referring to stream of bytes, an XHTML document
is an XML document which upon parsing would result in a data structure whose
root element is in the HTML namespace.

[Requirements for labeling when writing docs in a way that avoids circularities
left as an exercise to the editor.]

-- 
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.

Received on Thursday, 27 October 2011 08:43:49 UTC