Re: The non-polyglot elephant in the room from Kingsley Idehen on 2013-01-21 (www-tag@w3.org from January 2013)

From: Kingsley Idehen <kidehen@openlinksw.com>
Date: Mon, 21 Jan 2013 16:25:36 -0500
To: David Sheets <kosmo.zb@gmail.com>
CC: www-tag@w3.org
Message-ID: <50FDB250.3000702@openlinksw.com>
On 1/21/13 4:15 PM, David Sheets wrote:
> On Mon, Jan 21, 2013 at 11:47 AM, Kingsley Idehen
> <kidehen@openlinksw.com> wrote:
>> On 1/21/13 2:19 PM, Melvin Carvalho wrote:
>>
>> On 21 January 2013 20:13, Anne van Kesteren <annevk@annevk.nl> wrote:
>>> On Mon, Jan 21, 2013 at 7:24 PM, Kingsley Idehen <kidehen@openlinksw.com>
>>> wrote:
>>>> Please correct me if my characterization is wrong, but it appears to me
>>>> that
>>>> this entire affair is about content-type (mime type) squatting i.e.,
>>>> trying
>>>> to squeeze (X)HTML into content-type: text/html. If this is true, why on
>>>> earth would such an endeavor be encouraged by the W3C or its TAG?
> How is the definition of *a valid subset of text/html* squatting?

Is XHTML now a subset of HTML? Is (X)HTML a subset of HTML? As I stated, 
as part of my open comments, what am I missing in my characterization?

>
>>> Maybe because XML is listed quite prominently under "What is Web
>>> architecture?" in http://www.w3.org/2004/10/27-tag-charter.html though
>>> I would consider that particular part of the charter misguided. (It's
>>> also not at all practiced these days.)
> This is plainly false. Existence of new XML vocabularies demonstrates
> practice. It cannot also be true that it is "not at all practiced
> these days".
>
>> This is a good point, imho.  In 2004 it was perhaps reasonable to make a
>> 'bet' on XML.  However, favouring any one particular serialization
>> potentially lacks future proofing.  However, favouring the principles behind
>> XML, such as namespacing etc.,  makes more sense.
> Fragmentation is not future-proof.
>
>> Wikipedia has a reasonably nice write up on this topic:
>>
>> http://en.wikipedia.org/wiki/Comparison_of_data_serialization_formats
>>
>>>
>>>
>>> --
>>> http://annevankesteren.nl/
>>>
>>
>> At this juncture though, my main question is about XHTML or (X)HTML (the
>> polyglot) being squeezed into content-type designation: text/html. In
>> reality we have two content types with distinct characteristics which
>> thereby entails two distinct content-types: text/html (for HTML) and
>> application/xhtml+xml (for XHTML).
>>
>> Put differently, there is no content-type for the (X)HTML polyglot. Thus, we
>> have the struggle right now which is all about trying to make text/html the
>> designated content-type for the aforementioned polyglot.
> I was under the impression that an explicit goal of standardizing the
> HTML5 parser was so that HTML consumers and producers could rely on a
> de jure interpretation of nonsensical markup. While many consider
> XML's restrictions nonsensical, it is prima facie absurd that
> champions of HTML5's apologetic parser refuse to consider the subset
> of HTML5 that is also valid XHTML5 as clearly important to a
> population of authors.


So this is the key point of contention i.e., XHTML5 (unlike other XHTML 
incarnations) is a genuine subset of HTML.

>
> >From my perspective, anti-polyglot proponents advocate global
> text/html interpretation of nearly everything *except* XHTML.

Can you point me to an example? I ask primarily for clarity.

> XHTML is
> stricter than HTML and polyglot serializations *should* exist for any
> DOM (at least one would hope, what with the complexity burden of a
> fully conformant HTML parser).
>
> Are there legitimate technical architecture objections to specifying
> the set intersection of XHTML and HTML expressions?

Potentially, once you attempt to write parsers for HTML5 resources that 
include Microdata and/or RDFa structured data islands.

>
> I believe that there are many who would be interested in such
> guidelines who are typically underrepresented in these discussions.
>
> I am genuinely confused by arguments which appear to encourage liberal
> emission and deride conservative emission. Are web standards no longer
> concerned with robustness? HTML's new parser specification appears to
> disagree...

Once there's clarification on the issue of HTML and XHTML5 subset, the 
problems will become clear. All you have to do is attempt to use or 
write a parser for structured data (MicroData, Microformats, RDFa) 
embedded in an HTML5 document .

In my experience, undue burden is being pushed on the developers of 
parsers.

>
> Baffled,
>
> David Sheets
>
>


-- 

Regards,

Kingsley Idehen	
Founder & CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca handle: @kidehen
Google+ Profile: https://plus.google.com/112399767740508618350/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen
Attachments

application/pkcs7-signature attachment: S/MIME Cryptographic Signature
Received on Monday, 21 January 2013 21:25:59 UTC