Re: RDFa in HTML 5 from Philip Taylor on 2009-05-24 (public-html@w3.org from May 2009)

From: Philip Taylor <pjt47@cam.ac.uk>
Date: Sun, 24 May 2009 20:29:31 +0100
To: Maciej Stachowiak <mjs@apple.com>
CC: Shelley Powers <shelleyp@burningbird.net>, Sam Ruby <rubys@intertwingly.net>, Manu Sporny <msporny@digitalbazaar.com>, RDFa mailing list <public-rdf-in-xhtml-tf@w3.org>, HTML WG <public-html@w3.org>
Message-ID: <4A19A01B.5070402@cam.ac.uk>

Maciej Stachowiak wrote:
> [...]
> 
> 2) An offline processor written in Python may treat XHTML served as 
> text/html as XML, since there are so many off-the-shelf XML parsing 
> libraries and the script author may be unaware of the off-the-shelf 
> HTML5 parsers now available.

Some do that even when their authors are aware of off-the-shelf HTML5 
parsers - e.g. pyRdfa ignores any Content-Type and always tries parsing 
with an XML parser first, and if that fails then it (optionally) falls 
back to html5lib.

> If there is any difference between 
> text/html and application/xml processing rules for the same document, 
> this will almost certainly result in divergence in at least some cases. 
> Thus, we need to do at least one of ensuring identical processing, or 
> make it very clear that text/html must never be processed as XML by an 
> RDFa processor.

We've already failed at ensuring identical processing, because of 
parsing differences - e.g. if I write

   <p about="..." /> <span property="..."> ... </span>

then in XML it parses to sibling elements, but in HTML it parses to 
parent/child (because the trailing slash is ignored). If some processors 
unconditionally parse text/html content with an XML parser, they'll give 
different results to processors that correctly use a text/html parser, 
which results in a lack of interoperability and is therefore bad.

-- 
Philip Taylor
pjt47@cam.ac.uk

Received on Sunday, 24 May 2009 19:30:15 UTC