W3C home > Mailing lists > Public > www-tag@w3.org > January 2013

Re: The non-polyglot elephant in the room

From: David Sheets <kosmo.zb@gmail.com>
Date: Mon, 21 Jan 2013 13:15:45 -0800
Message-ID: <CAAWM5TypV_rG2yLQ1bK+aRoBSW1HCD5ZDTmVeWm-y-6G5o1u8g@mail.gmail.com>
To: Kingsley Idehen <kidehen@openlinksw.com>
Cc: www-tag@w3.org
On Mon, Jan 21, 2013 at 11:47 AM, Kingsley Idehen
<kidehen@openlinksw.com> wrote:
> On 1/21/13 2:19 PM, Melvin Carvalho wrote:
> On 21 January 2013 20:13, Anne van Kesteren <annevk@annevk.nl> wrote:
>> On Mon, Jan 21, 2013 at 7:24 PM, Kingsley Idehen <kidehen@openlinksw.com>
>> wrote:
>> > Please correct me if my characterization is wrong, but it appears to me
>> > that
>> > this entire affair is about content-type (mime type) squatting i.e.,
>> > trying
>> > to squeeze (X)HTML into content-type: text/html. If this is true, why on
>> > earth would such an endeavor be encouraged by the W3C or its TAG?

How is the definition of *a valid subset of text/html* squatting?

>> Maybe because XML is listed quite prominently under "What is Web
>> architecture?" in http://www.w3.org/2004/10/27-tag-charter.html though
>> I would consider that particular part of the charter misguided. (It's
>> also not at all practiced these days.)

This is plainly false. Existence of new XML vocabularies demonstrates
practice. It cannot also be true that it is "not at all practiced
these days".

> This is a good point, imho.  In 2004 it was perhaps reasonable to make a
> 'bet' on XML.  However, favouring any one particular serialization
> potentially lacks future proofing.  However, favouring the principles behind
> XML, such as namespacing etc.,  makes more sense.

Fragmentation is not future-proof.

> Wikipedia has a reasonably nice write up on this topic:
> http://en.wikipedia.org/wiki/Comparison_of_data_serialization_formats
>> --
>> http://annevankesteren.nl/
> At this juncture though, my main question is about XHTML or (X)HTML (the
> polyglot) being squeezed into content-type designation: text/html. In
> reality we have two content types with distinct characteristics which
> thereby entails two distinct content-types: text/html (for HTML) and
> application/xhtml+xml (for XHTML).
> Put differently, there is no content-type for the (X)HTML polyglot. Thus, we
> have the struggle right now which is all about trying to make text/html the
> designated content-type for the aforementioned polyglot.

I was under the impression that an explicit goal of standardizing the
HTML5 parser was so that HTML consumers and producers could rely on a
de jure interpretation of nonsensical markup. While many consider
XML's restrictions nonsensical, it is prima facie absurd that
champions of HTML5's apologetic parser refuse to consider the subset
of HTML5 that is also valid XHTML5 as clearly important to a
population of authors.

>From my perspective, anti-polyglot proponents advocate global
text/html interpretation of nearly everything *except* XHTML. XHTML is
stricter than HTML and polyglot serializations *should* exist for any
DOM (at least one would hope, what with the complexity burden of a
fully conformant HTML parser).

Are there legitimate technical architecture objections to specifying
the set intersection of XHTML and HTML expressions?

I believe that there are many who would be interested in such
guidelines who are typically underrepresented in these discussions.

I am genuinely confused by arguments which appear to encourage liberal
emission and deride conservative emission. Are web standards no longer
concerned with robustness? HTML's new parser specification appears to


David Sheets
Received on Monday, 21 January 2013 21:17:03 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 22:56:51 UTC