Re: NU’s polyglot possibilities (Was: The non-polyglot elephant in the room) from Kingsley Idehen on 2013-01-25 (www-tag@w3.org from January 2013)

From: Kingsley Idehen <kidehen@openlinksw.com>
Date: Fri, 25 Jan 2013 08:57:07 -0500
To: www-tag@w3.org
Message-ID: <51028F33.6050909@openlinksw.com>

On 1/25/13 7:55 AM, Henri Sivonen wrote:
> On Fri, Jan 25, 2013 at 12:14 AM, Alex Russell <slightlyoff@google.com> wrote:
>> What am I missing?
> You aren't missing anything.
>
>> Under what conditions can the expectations of producers
>> and consumers of polyglot documents be simplified by the addition of
>> polyglot markup to their existing world/toolchain?
> 1) The producer wants to maintain a document as a single file of bytes
> on an HTTP server that serves from the file system.
> AND
> 2) The producer wants to serve those bytes as text/html to cater to
> the general public—including IE8.
> AND
> 3) The producer wants to facilitate a non-browser consumer that
>    a) Does not possess a conforming HTML parser.
> AND
>    b) Possesses an XML parser or a non-conforming HTML parser that
> happens to barf less if the input is XML-like.
> AND
>    c) Is not seeking to consume Web content in general (as that would
> necessitate violating condition a).
> AND
>    d) Has a line of communication back to the producer in order to
> complain when the document inevitably becomes non-polyglot by accident
> as a result of an edit.
>
> So a very narrow case. Not worth a REC, in my opinion. A solution in
> search of a problem.
>
We assume this to be a standard (X)HTML5 polyglot snippet:
<!DOCTYPE html>
<html lang="" xmlns="http://www.w3.org/1999/xhtml" xml:lang="">
   <head>
     <title>Polyglot Test</title>
     <meta name="generator" content="BlueGriffon wysiwyg editor" />
     <meta charset="UTF-8" />
   </head>
<body>
</body>
</html>

Here is one example from the wild of an HTML5 polyglot document the 
includes an RDFa based structured data island: 
http://schema.org/docs/schema_org_rdfa.html .

Please note this excerpt from the above:
<!DOCTYPE html>
<html>
   <head>
     <title>RDFa Lite Reflection</title>
   </head>
   <body>
     <h1>Schema.org core schema</h1>
     <p>This is an <b>experimental</b> RDFa 1.1 Lite representation of 
the schema.org schema, copied here for collaboration and <a 
href="mailto:public-vocabs@w3.org">feedback</a>.</p>
     <hr />
     <div typeof="rdfs:Class" resource="http://schema.org/Thing">
       <span class="h" property="rdfs:label">Thing</span>
       <span property="rdfs:comment">The most generic type of item.</span>
     </div>
     </body>
</html>

Here is the cURL ouput:
curl -I http://schema.org/docs/schema_org_rdfa.html
HTTP/1.1 200 OK
Vary: Accept-Encoding
Content-Type: text/html


As you can see from the above, we have Content-type squatting where 
XHTML5 is being packed into HTML via the so called (X)HTML5 polyglot 
that ultimately forces the developer of a parser or any other consumer 
to sniff content if it seeks to generate an RDF based Linked Data graph 
from this document.

As I've already stated, and still await some convincing response from 
the TAG, this is just wrong.

XHTML5 != HTML5. Each should have its own mime type. Or we have to be 
crystal clear about the fact that an (X)HTML5 polyglot must contain the 
<html/> attribute @xmlns as in:
<html lang="" xmlns="http://www.w3.org/1999/xhtml" xml:lang=""> .

I specifically picked a schema.org example because it exemplifies the 
bigger problem re. (X)HTML5 re. RDFa.

-- 

Regards,

Kingsley Idehen 
Founder & CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca handle: @kidehen
Google+ Profile: https://plus.google.com/112399767740508618350/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen

Attachments

application/pkcs7-signature attachment: S/MIME Cryptographic Signature

Received on Friday, 25 January 2013 13:57:37 UTC