Re: The non-polyglot elephant in the room from Kingsley Idehen on 2013-01-21 (www-tag@w3.org from January 2013)

From: Kingsley Idehen <kidehen@openlinksw.com>
Date: Mon, 21 Jan 2013 18:54:51 -0500
To: www-tag@w3.org
Message-ID: <50FDD54B.40501@openlinksw.com>
On 1/21/13 5:32 PM, Larry Masinter wrote:
> Recommendations should be appropriate for the scope that they claim to cover.
> A recommendation for HTML which claims to satisfy all of the existing use cases of HTML (for web pages, for email, for embedded devices) has a strong requirement to be appropriate for all of those use cases.
>
> However, the "scope" of polyglot is quite narrow: it is for a narrow set of applications which wish to serve the same content as text/html to HTML tools and as application/xhtml+xml to XML tools. As such, the fact that there are other use cases for which polyglot may not be appropriate -- it is irrelevant.
>
> I don't think the objections raised are taking the claimed scope of the specifications into account.
>
> It is irrelevant whether the use cases are 2% or 40% of use cases currently, the only question should be whether there is sufficient interest.
>
> ===
> The standards process requires a linked but asynchronous coordination between implementations, specifications, and test cases.  For some period of time, implementations may lag the specification; at some times, the specifications lag behind the implementations.  The fact that some tools purporting to support polyglot lag behind the specification is not a sufficient reason to reject the polyglot spec, as long as those implementations intend to follow the standard once it stabilizes -- that's the reason for the CR phase and identifying exit criteria. So if there are some implementations that don't match the current spec, that is not a reason by itself to reject the spec; perhaps instead it should be added to the "CR exit criteria".
> ===
> Content-type does not partition the space of content. Content-type is descriptive metadata; it is a description of delivery intent, not of any intrinsic property. The *same* data stream can be delivered with different content-types, with potentially different intent and different intent. So the notion of "squatting" on content-types does not apply to polyglot. Polyglot instead is a kind of pun, where the same content can be delivered as multiple content-type values, with the intent of having the same (or at least very similar) results.

That means (I assume)  serving Content-type with matching descriptive 
metadata i.e., serve HTML and HTML5 as text/html and XHTML5 as what? I 
assume: application/xhtml+xml ?

Question with regards to what Content-type to use when  I serve HTML5 
via my HTTP server to HTML and HTML5 user agents :

1. When I embed Microdata I should indicate Content-type ___________ ?
2. When I embed Microformats I should indicate Content-type ______________ ?
3. ditto when I embed RDFa Lite ____________ ?
4. ditto when I embed  RDFa ________ ?

If all of the above should be text/html, then there is a burden for 
HTML5  parser developers, and most will simply ignore XHTML5 (no matter 
how hard one tries to squeeze this into HTML via text/html).

Kingsley
>
>
>
>
>> -----Original Message-----
>> From: Kingsley Idehen [mailto:kidehen@openlinksw.com]
>> Sent: Monday, January 21, 2013 10:26 PM
>> To: David Sheets
>> Cc: www-tag@w3.org
>> Subject: Re: The non-polyglot elephant in the room
>>
>> On 1/21/13 4:15 PM, David Sheets wrote:
>>> On Mon, Jan 21, 2013 at 11:47 AM, Kingsley Idehen
>>> <kidehen@openlinksw.com> wrote:
>>>> On 1/21/13 2:19 PM, Melvin Carvalho wrote:
>>>>
>>>> On 21 January 2013 20:13, Anne van Kesteren <annevk@annevk.nl> wrote:
>>>>> On Mon, Jan 21, 2013 at 7:24 PM, Kingsley Idehen
>> <kidehen@openlinksw.com>
>>>>> wrote:
>>>>>> Please correct me if my characterization is wrong, but it appears to me
>>>>>> that
>>>>>> this entire affair is about content-type (mime type) squatting i.e.,
>>>>>> trying
>>>>>> to squeeze (X)HTML into content-type: text/html. If this is true, why on
>>>>>> earth would such an endeavor be encouraged by the W3C or its TAG?
>>> How is the definition of *a valid subset of text/html* squatting?
>> Is XHTML now a subset of HTML? Is (X)HTML a subset of HTML? As I stated,
>> as part of my open comments, what am I missing in my characterization?
>>
>>>>> Maybe because XML is listed quite prominently under "What is Web
>>>>> architecture?" in http://www.w3.org/2004/10/27-tag-charter.html though
>>>>> I would consider that particular part of the charter misguided. (It's
>>>>> also not at all practiced these days.)
>>> This is plainly false. Existence of new XML vocabularies demonstrates
>>> practice. It cannot also be true that it is "not at all practiced
>>> these days".
>>>
>>>> This is a good point, imho.  In 2004 it was perhaps reasonable to make a
>>>> 'bet' on XML.  However, favouring any one particular serialization
>>>> potentially lacks future proofing.  However, favouring the principles behind
>>>> XML, such as namespacing etc.,  makes more sense.
>>> Fragmentation is not future-proof.
>>>
>>>> Wikipedia has a reasonably nice write up on this topic:
>>>>
>>>> http://en.wikipedia.org/wiki/Comparison_of_data_serialization_formats
>>>>
>>>>>
>>>>> --
>>>>> http://annevankesteren.nl/
>>>>>
>>>> At this juncture though, my main question is about XHTML or (X)HTML (the
>>>> polyglot) being squeezed into content-type designation: text/html. In
>>>> reality we have two content types with distinct characteristics which
>>>> thereby entails two distinct content-types: text/html (for HTML) and
>>>> application/xhtml+xml (for XHTML).
>>>>
>>>> Put differently, there is no content-type for the (X)HTML polyglot. Thus, we
>>>> have the struggle right now which is all about trying to make text/html the
>>>> designated content-type for the aforementioned polyglot.
>>> I was under the impression that an explicit goal of standardizing the
>>> HTML5 parser was so that HTML consumers and producers could rely on a
>>> de jure interpretation of nonsensical markup. While many consider
>>> XML's restrictions nonsensical, it is prima facie absurd that
>>> champions of HTML5's apologetic parser refuse to consider the subset
>>> of HTML5 that is also valid XHTML5 as clearly important to a
>>> population of authors.
>>
>> So this is the key point of contention i.e., XHTML5 (unlike other XHTML
>> incarnations) is a genuine subset of HTML.
>>
>>> >From my perspective, anti-polyglot proponents advocate global
>>> text/html interpretation of nearly everything *except* XHTML.
>> Can you point me to an example? I ask primarily for clarity.
>>
>>> XHTML is
>>> stricter than HTML and polyglot serializations *should* exist for any
>>> DOM (at least one would hope, what with the complexity burden of a
>>> fully conformant HTML parser).
>>>
>>> Are there legitimate technical architecture objections to specifying
>>> the set intersection of XHTML and HTML expressions?
>> Potentially, once you attempt to write parsers for HTML5 resources that
>> include Microdata and/or RDFa structured data islands.
>>
>>> I believe that there are many who would be interested in such
>>> guidelines who are typically underrepresented in these discussions.
>>>
>>> I am genuinely confused by arguments which appear to encourage liberal
>>> emission and deride conservative emission. Are web standards no longer
>>> concerned with robustness? HTML's new parser specification appears to
>>> disagree...
>> Once there's clarification on the issue of HTML and XHTML5 subset, the
>> problems will become clear. All you have to do is attempt to use or
>> write a parser for structured data (MicroData, Microformats, RDFa)
>> embedded in an HTML5 document .
>>
>> In my experience, undue burden is being pushed on the developers of
>> parsers.
>>
>>> Baffled,
>>>
>>> David Sheets
>>>
>>>
>>
>> --
>>
>> Regards,
>>
>> Kingsley Idehen
>> Founder & CEO
>> OpenLink Software
>> Company Web: http://www.openlinksw.com
>> Personal Weblog: http://www.openlinksw.com/blog/~kidehen
>> Twitter/Identi.ca handle: @kidehen
>> Google+ Profile: https://plus.google.com/112399767740508618350/about
>> LinkedIn Profile: http://www.linkedin.com/in/kidehen
>>
>>
>>
>>
>
>


-- 

Regards,

Kingsley Idehen	
Founder & CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca handle: @kidehen
Google+ Profile: https://plus.google.com/112399767740508618350/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen
Attachments

application/pkcs7-signature attachment: S/MIME Cryptographic Signature
Received on Monday, 21 January 2013 23:55:15 UTC