W3C home > Mailing lists > Public > www-tag@w3.org > January 2013

RE: The non-polyglot elephant in the room

From: Larry Masinter <masinter@adobe.com>
Date: Mon, 21 Jan 2013 14:32:14 -0800
To: Kingsley Idehen <kidehen@openlinksw.com>, David Sheets <kosmo.zb@gmail.com>
CC: "www-tag@w3.org" <www-tag@w3.org>
Message-ID: <C68CB012D9182D408CED7B884F441D4D1E3FF99BFA@nambxv01a.corp.adobe.com>
Recommendations should be appropriate for the scope that they claim to cover.
A recommendation for HTML which claims to satisfy all of the existing use cases of HTML (for web pages, for email, for embedded devices) has a strong requirement to be appropriate for all of those use cases.

However, the "scope" of polyglot is quite narrow: it is for a narrow set of applications which wish to serve the same content as text/html to HTML tools and as application/xhtml+xml to XML tools. As such, the fact that there are other use cases for which polyglot may not be appropriate -- it is irrelevant.

I don't think the objections raised are taking the claimed scope of the specifications into account.

It is irrelevant whether the use cases are 2% or 40% of use cases currently, the only question should be whether there is sufficient interest.

The standards process requires a linked but asynchronous coordination between implementations, specifications, and test cases.  For some period of time, implementations may lag the specification; at some times, the specifications lag behind the implementations.  The fact that some tools purporting to support polyglot lag behind the specification is not a sufficient reason to reject the polyglot spec, as long as those implementations intend to follow the standard once it stabilizes -- that's the reason for the CR phase and identifying exit criteria. So if there are some implementations that don't match the current spec, that is not a reason by itself to reject the spec; perhaps instead it should be added to the "CR exit criteria".
Content-type does not partition the space of content. Content-type is descriptive metadata; it is a description of delivery intent, not of any intrinsic property. The *same* data stream can be delivered with different content-types, with potentially different intent and different intent. So the notion of "squatting" on content-types does not apply to polyglot. Polyglot instead is a kind of pun, where the same content can be delivered as multiple content-type values, with the intent of having the same (or at least very similar) results.  

> -----Original Message-----
> From: Kingsley Idehen [mailto:kidehen@openlinksw.com]
> Sent: Monday, January 21, 2013 10:26 PM
> To: David Sheets
> Cc: www-tag@w3.org
> Subject: Re: The non-polyglot elephant in the room
> On 1/21/13 4:15 PM, David Sheets wrote:
> > On Mon, Jan 21, 2013 at 11:47 AM, Kingsley Idehen
> > <kidehen@openlinksw.com> wrote:
> >> On 1/21/13 2:19 PM, Melvin Carvalho wrote:
> >>
> >> On 21 January 2013 20:13, Anne van Kesteren <annevk@annevk.nl> wrote:
> >>> On Mon, Jan 21, 2013 at 7:24 PM, Kingsley Idehen
> <kidehen@openlinksw.com>
> >>> wrote:
> >>>> Please correct me if my characterization is wrong, but it appears to me
> >>>> that
> >>>> this entire affair is about content-type (mime type) squatting i.e.,
> >>>> trying
> >>>> to squeeze (X)HTML into content-type: text/html. If this is true, why on
> >>>> earth would such an endeavor be encouraged by the W3C or its TAG?
> > How is the definition of *a valid subset of text/html* squatting?
> Is XHTML now a subset of HTML? Is (X)HTML a subset of HTML? As I stated,
> as part of my open comments, what am I missing in my characterization?
> >
> >>> Maybe because XML is listed quite prominently under "What is Web
> >>> architecture?" in http://www.w3.org/2004/10/27-tag-charter.html though
> >>> I would consider that particular part of the charter misguided. (It's
> >>> also not at all practiced these days.)
> > This is plainly false. Existence of new XML vocabularies demonstrates
> > practice. It cannot also be true that it is "not at all practiced
> > these days".
> >
> >> This is a good point, imho.  In 2004 it was perhaps reasonable to make a
> >> 'bet' on XML.  However, favouring any one particular serialization
> >> potentially lacks future proofing.  However, favouring the principles behind
> >> XML, such as namespacing etc.,  makes more sense.
> > Fragmentation is not future-proof.
> >
> >> Wikipedia has a reasonably nice write up on this topic:
> >>
> >> http://en.wikipedia.org/wiki/Comparison_of_data_serialization_formats
> >>
> >>>
> >>>
> >>> --
> >>> http://annevankesteren.nl/
> >>>
> >>
> >> At this juncture though, my main question is about XHTML or (X)HTML (the
> >> polyglot) being squeezed into content-type designation: text/html. In
> >> reality we have two content types with distinct characteristics which
> >> thereby entails two distinct content-types: text/html (for HTML) and
> >> application/xhtml+xml (for XHTML).
> >>
> >> Put differently, there is no content-type for the (X)HTML polyglot. Thus, we
> >> have the struggle right now which is all about trying to make text/html the
> >> designated content-type for the aforementioned polyglot.
> > I was under the impression that an explicit goal of standardizing the
> > HTML5 parser was so that HTML consumers and producers could rely on a
> > de jure interpretation of nonsensical markup. While many consider
> > XML's restrictions nonsensical, it is prima facie absurd that
> > champions of HTML5's apologetic parser refuse to consider the subset
> > of HTML5 that is also valid XHTML5 as clearly important to a
> > population of authors.
> So this is the key point of contention i.e., XHTML5 (unlike other XHTML
> incarnations) is a genuine subset of HTML.
> >
> > >From my perspective, anti-polyglot proponents advocate global
> > text/html interpretation of nearly everything *except* XHTML.
> Can you point me to an example? I ask primarily for clarity.
> > XHTML is
> > stricter than HTML and polyglot serializations *should* exist for any
> > DOM (at least one would hope, what with the complexity burden of a
> > fully conformant HTML parser).
> >
> > Are there legitimate technical architecture objections to specifying
> > the set intersection of XHTML and HTML expressions?
> Potentially, once you attempt to write parsers for HTML5 resources that
> include Microdata and/or RDFa structured data islands.
> >
> > I believe that there are many who would be interested in such
> > guidelines who are typically underrepresented in these discussions.
> >
> > I am genuinely confused by arguments which appear to encourage liberal
> > emission and deride conservative emission. Are web standards no longer
> > concerned with robustness? HTML's new parser specification appears to
> > disagree...
> Once there's clarification on the issue of HTML and XHTML5 subset, the
> problems will become clear. All you have to do is attempt to use or
> write a parser for structured data (MicroData, Microformats, RDFa)
> embedded in an HTML5 document .
> In my experience, undue burden is being pushed on the developers of
> parsers.
> >
> > Baffled,
> >
> > David Sheets
> >
> >
> --
> Regards,
> Kingsley Idehen
> Founder & CEO
> OpenLink Software
> Company Web: http://www.openlinksw.com
> Personal Weblog: http://www.openlinksw.com/blog/~kidehen
> Twitter/Identi.ca handle: @kidehen
> Google+ Profile: https://plus.google.com/112399767740508618350/about
> LinkedIn Profile: http://www.linkedin.com/in/kidehen
Received on Monday, 21 January 2013 22:32:48 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 22:56:51 UTC