Re: The non-polyglot elephant in the room

On 1/21/13 5:16 PM, David Sheets wrote:
> On Mon, Jan 21, 2013 at 1:25 PM, Kingsley Idehen <kidehen@openlinksw.com> wrote:
>> On 1/21/13 4:15 PM, David Sheets wrote:
>>> On Mon, Jan 21, 2013 at 11:47 AM, Kingsley Idehen
>>> <kidehen@openlinksw.com> wrote:
>>>> On 1/21/13 2:19 PM, Melvin Carvalho wrote:
>>>>
>>>> On 21 January 2013 20:13, Anne van Kesteren <annevk@annevk.nl> wrote:
>>>>> On Mon, Jan 21, 2013 at 7:24 PM, Kingsley Idehen
>>>>> <kidehen@openlinksw.com>
>>>>> wrote:
>>>>>> Please correct me if my characterization is wrong, but it appears to me
>>>>>> that
>>>>>> this entire affair is about content-type (mime type) squatting i.e.,
>>>>>> trying
>>>>>> to squeeze (X)HTML into content-type: text/html. If this is true, why
>>>>>> on
>>>>>> earth would such an endeavor be encouraged by the W3C or its TAG?
>>> How is the definition of *a valid subset of text/html* squatting?
>>
>> Is XHTML now a subset of HTML? Is (X)HTML a subset of HTML? As I stated, as
>> part of my open comments, what am I missing in my characterization?
> It's not clear to me that they have that relation. There does exist a
> subset of HTML that is also XHTML and vice versa.
>
>>>>> Maybe because XML is listed quite prominently under "What is Web
>>>>> architecture?" in http://www.w3.org/2004/10/27-tag-charter.html though
>>>>> I would consider that particular part of the charter misguided. (It's
>>>>> also not at all practiced these days.)
>>> This is plainly false. Existence of new XML vocabularies demonstrates
>>> practice. It cannot also be true that it is "not at all practiced
>>> these days".
>>>
>>>> This is a good point, imho.  In 2004 it was perhaps reasonable to make a
>>>> 'bet' on XML.  However, favouring any one particular serialization
>>>> potentially lacks future proofing.  However, favouring the principles
>>>> behind
>>>> XML, such as namespacing etc.,  makes more sense.
>>> Fragmentation is not future-proof.
>>>
>>>> Wikipedia has a reasonably nice write up on this topic:
>>>>
>>>> http://en.wikipedia.org/wiki/Comparison_of_data_serialization_formats
>>>>
>>>>>
>>>>> --
>>>>> http://annevankesteren.nl/
>>>>>
>>>> At this juncture though, my main question is about XHTML or (X)HTML (the
>>>> polyglot) being squeezed into content-type designation: text/html. In
>>>> reality we have two content types with distinct characteristics which
>>>> thereby entails two distinct content-types: text/html (for HTML) and
>>>> application/xhtml+xml (for XHTML).
>>>>
>>>> Put differently, there is no content-type for the (X)HTML polyglot. Thus,
>>>> we
>>>> have the struggle right now which is all about trying to make text/html
>>>> the
>>>> designated content-type for the aforementioned polyglot.
>>> I was under the impression that an explicit goal of standardizing the
>>> HTML5 parser was so that HTML consumers and producers could rely on a
>>> de jure interpretation of nonsensical markup. While many consider
>>> XML's restrictions nonsensical, it is prima facie absurd that
>>> champions of HTML5's apologetic parser refuse to consider the subset
>>> of HTML5 that is also valid XHTML5 as clearly important to a
>>> population of authors.
>> So this is the key point of contention i.e., XHTML5 (unlike other XHTML
>> incarnations) is a genuine subset of HTML.
> I don't believe they have this relation. There is a set of documents
> that satisfies both standards, however.
>
>>> >From my perspective, anti-polyglot proponents advocate global
>>> text/html interpretation of nearly everything *except* XHTML.
>> Can you point me to an example? I ask primarily for clarity.
> I'm not sure which assertion you'd like an example for so I've made
> some guesses.
>
> <http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html#parsing>
> describes a very lenient HTML parser which defines an interpretation
> for many strange text/html documents ("Hello, <I><b>world</i>!").
>
> Advocacy for this lenient parser has been prevalent for several years.
> The same standard provides guidance on writing "good" HTML which does
> not take advantage of all of the HTML parser's quirks. This is a
> subset of HTML.
>
> For evidence of resistance to the standard definition of the subset of
> HTML that is also XHTML with the same meaning, you need only look to
> this and previous threads on this topic.
>
>>> XHTML is
>>> stricter than HTML and polyglot serializations *should* exist for any
>>> DOM (at least one would hope, what with the complexity burden of a
>>> fully conformant HTML parser).
>>>
>>> Are there legitimate technical architecture objections to specifying
>>> the set intersection of XHTML and HTML expressions?
>>
>> Potentially, once you attempt to write parsers for HTML5 resources that
>> include Microdata and/or RDFa structured data islands.
> How does the definition of a mutually compatible subset complicate
> HTML5 parsers which include microdata/rdfa? By definition, the
> polyglot subset must work in existing HTML5 parsers. If HTML5 semantic
> markup cannot share syntax with XHTML, the document cannot be
> serialized in a polyglot fashion. What have I missed?
>
>>> I believe that there are many who would be interested in such
>>> guidelines who are typically underrepresented in these discussions.
>>>
>>> I am genuinely confused by arguments which appear to encourage liberal
>>> emission and deride conservative emission. Are web standards no longer
>>> concerned with robustness? HTML's new parser specification appears to
>>> disagree...
>> Once there's clarification on the issue of HTML and XHTML5 subset, the
>> problems will become clear. All you have to do is attempt to use or write a
>> parser for structured data (MicroData, Microformats, RDFa) embedded in an
>> HTML5 document .
> I began just this task and have since put it on hold due to the
> complexity of the HTML5 parsing algorithm and personal time
> constraints. If a document is both valid HTML and valid XHTML5, how is
> handling this content harder than just handling HTML5 content?
>
> There may certainly be important classes of documents for which no
> polyglot serialization is possible. Unique HTML features, unique XHTML
> features, and the HTML/XHTML overlap are all important for author
> education. Aesthetically, I would like as few special cases for each
> HTML and XHTML and greater syntax compatibility; but with a faction
> advocating extensive bugwards compatibility and disinterest in
> unification, I'm not holding my breath.
>
>> In my experience, undue burden is being pushed on the developers of parsers.
> Which parser developers does a polyglot spec burden? Polyglot should
> be parsable by both HTML5 and XML parsers without modification.
>
>
>
>
Yes, but HTML5 parser developers think in terms of HTML. They don't 
think about XML in any shape or form. Net effect, XML related rules are 
kicked to the curb.

-- 

Regards,

Kingsley Idehen	
Founder & CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca handle: @kidehen
Google+ Profile: https://plus.google.com/112399767740508618350/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen

Received on Monday, 21 January 2013 23:36:41 UTC