Re: Scope of W3C recommendations; core issue for polyglot & DRM from John Kemp on 2013-01-29 (www-tag@w3.org from January 2013)

From: John Kemp <john@jkemp.net>
Date: Tue, 29 Jan 2013 08:40:48 -0500
To: Kingsley Idehen <kidehen@openlinksw.com>
CC: www-tag@w3.org
Message-ID: <5107D160.1020603@jkemp.net>

On 01/29/2013 08:03 AM, Kingsley Idehen wrote:
> FWIW -- We've already been through the hell I am trying to prevent
> others from embarking upon. We have more than a 100+ transformers for a
> variety of XML based data sources (including XHTML, XHTML, HTML5) and I
> know that what we've been through isn't something others will embark upon.
>
> My fundamental concern: most (X)HTML5 is published without appropriate
> hints to processors. Thus, you have to sniff on the content.
>
> If you haven't attempted to extract a Microdata, RDFa, Microformats data
> island from an (X)HTML5 document you won't be aware of these problems
> out in the field. In addition, beyond schema.org, most online retailers
> (as a consequence of Schema.org) are producing the kind of problematic
> (X)HTML5 that I describe.
>

It sounds like you are exactly aware of the problem then. People try to 
do something they want to do, and they all do it in different ways. A 
consumer is unable to know, reliably, what to do with this data, since 
the problem you mention is completely undocumented without a "polyglot 
specification".

Is your advice "don't do this"? If so, what would you have them do 
instead? If your advice is "do it this way", where is the specification 
that shows this way?

JohnK

Received on Tuesday, 29 January 2013 13:41:18 UTC