- From: David Sheets <kosmo.zb@gmail.com>
- Date: Tue, 22 Jan 2013 15:04:29 -0800
- To: Henri Sivonen <hsivonen@iki.fi>
- Cc: www-tag@w3.org
On Mon, Jan 21, 2013 at 11:03 PM, Henri Sivonen <hsivonen@iki.fi> wrote: > On Mon, Jan 21, 2013 at 11:15 PM, David Sheets <kosmo.zb@gmail.com> wrote: >> I was under the impression that an explicit goal of standardizing the >> HTML5 parser was so that HTML consumers and producers could rely on a >> de jure interpretation of nonsensical markup. > > Consumers, yes. Logically, it will eventually mean that producers can, > too, but we’ve tried to avoid advertising that. Eventually? Programmers already know and may write their software with that in mind. >> While many consider >> XML's restrictions nonsensical, it is prima facie absurd that >> champions of HTML5's apologetic parser refuse to consider the subset >> of HTML5 that is also valid XHTML5 as clearly important to a >> population of authors. > > I don’t think it’s absurd for HTML parser champions to be opposed to > polyglot, since championing HTML parsing involves asking people to > possess an HTML parser (everyone already has an XML parser). HTML parser champions cite liberal interpretation of a primary benefit of their approach. A subset of HTML serializations happens to also be well-formed XML due to this approach. How are nonsensical HTML serializations acceptable but strict rules for dual HTML/XML serializations offensive? > If you > possess an HTML parser and an XML parser, you don’t need polyglot. If > you get text/html, you use the HTML parser. If you get > application/xhtml+xml, you use the XML parser. If you only possess a single parser and someone hands an (X)HTML polyglot doc to you, you are guaranteed to be able to parse it reliably. With polyglot, only a single bytestream needs publication for compatible parsing. With polyglot, little to no server-side participation is required to publish both HTML and XML. >> >From my perspective, anti-polyglot proponents advocate global >> text/html interpretation of nearly everything *except* XHTML. > > I’m advocating for text/html interpretation of text/html *only*. I am > advocating against text/html interpretation of application/xhtml+xml > and vice versa. There exist bytestreams that have identical meaning in both text/html and application/xhtml+xml. I don't think anyone is suggesting that text/html and application/xhtml+xml should completely unify their semantics beyond what is already clearly aligned and easily interpretable. >> XHTML is >> stricter than HTML and polyglot serializations *should* exist for any >> DOM > > Impossible in the general case without breaking backwards > compatibility of either HTML or XML. I.e. not worth the trouble. "General case" is not applicable when we are discussing a subset of the semantics of HTML and a subset of the semantics of XHTML. Could you give some examples of DOMs that are expressible in HTML but not XML and how accommodation of an XML serialization of those DOMs would necessitate breaking backward compat for HTML? >> I am genuinely confused by arguments which appear to encourage liberal >> emission and deride conservative emission. > > Do you consider “valid HTML for text/html and valid XHTML for > application/xhtml+xml” “liberal”? Is "Hello, <p><div>world</p>!" a valid HTML document? Why is this document more reasonable to grant definitive semantic meaning than a document that happens to satisfy both text/html and application/xhtml+xml? A serialization that is valid for both text/html and application/xml is quite conservative. Assigning most any HTML-ish document a definitive DOM and associated semantics is quite liberal. HTML5 > valid HTML5 > polyglot Regards, David P.S. Your mail client appears to be destroying conversation metadata (To, Cc) which makes it harder to have a conversation now and also understand it in the future. </irony>
Received on Tuesday, 22 January 2013 23:06:27 UTC