W3C home > Mailing lists > Public > public-xg-webid@w3.org > January 2012

Re: rdfa parsing issue -- was: fixed https://foafssl.org/test/WebId

From: Henry Story <henry.story@bblfish.net>
Date: Fri, 6 Jan 2012 19:37:52 +0100
Cc: Damian Steer <pldms@mac.com>, Jürgen Jakobitsch <j.jakobitsch@semantic-web.at>, "public-xg-webid@w3.org XG" <public-xg-webid@w3.org>
Message-Id: <B33FE4CD-3D40-43B8-9463-4D69ADA23E17@bblfish.net>
To: Stéphane Corlosquet <scorlosquet@gmail.com>
yes, well this really should be documented on our wiki. This is the kind of thing that 
is going to turn up again and again. I wonder even if it should be in the spec as a note.

Henry


On 6 Jan 2012, at 17:57, Stéphane Corlosquet wrote:

> 
> 
> On Fri, Jan 6, 2012 at 9:57 AM, Damian Steer <pldms@mac.com> wrote:
> Hi Henry and Jürgen,
> 
> On 06/01/12 12:49, Henry Story wrote:
> 
> > Shellac's parser parses the xhtml correctly as xhtml in fact, but
> > when the html parser is used it comes to a different conclusion.
> 
> Yes, this is becoming a classic issue, and has nothing to do with RDFa
> (although RDFa obscures the issue horribly).
> 
> > RDFA 1 is defined in xhtml only I understand, so it is true that we
> > are going beyond what the spec by trying to parse html too. Perhaps
> > this will be a lot simplified with rdfa1.1 which can be made to work
> > with html5.
> 
> Yes, RDFa 1.0 is only really defined for xhtml, although useful work was
> done on html 5 at the time (there are some html 5 tests). RDFa 1.1 does
> address html 5, but note that it doesn't change anything here.
> 
> The problem is this:
> 
>    <div rel="foaf:depiction" href="http://2sea.org/2sealogo.png"/>
>    <div rel="cert:key">
>        ...
>    </div>
> 
> An xml parser sees a closed div, followed by another div. An html parser
> sees a broken div so repairs it as follows:
> 
>    <div rel="foaf:depiction" href="http://2sea.org/2sealogo.png">
>      <div rel="cert:key">
>        ...
>      </div>
>    </div> <!-- close that div -->
> 
> i.e. one div contains another now, and thus you find
> 
> <http://2sea.org/2sealogo.png> cert:key ....
> 
> I ought to add a utility to switch the parser based on content type,
> however in practice there's so much broken xhtml out there that tag soup
> parsing is much safer (although it does lead to issues like this).
> 
> My advice would be to expect tag soup parsing in the wild and change the
> html:
> 
>    <div rel="foaf:depiction" href="http://2sea.org/2sealogo.png"></div>
> 
> +1. In Drupal 7 RDFa we purposely didn't use the minimized version of elements to ensure maximum compatibility. Here is the link to the rule we used: http://www.w3.org/TR/xhtml1/#C_3. So elements are always explicitly closed like this 
> 
> <span rel="schema:url" resource="/event/drupalcamp-nyc"></span>
> 
> Steph.
> 
>  
> 
> Hope this makes sense,
> 
> Damian
> 
> 

Social Web Architect
http://bblfish.net/
Received on Friday, 6 January 2012 22:49:38 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Friday, 6 January 2012 22:49:38 GMT