W3C home > Mailing lists > Public > public-xg-webid@w3.org > January 2012

Re: rdfa parsing issue -- was: fixed https://foafssl.org/test/WebId

From: Stéphane Corlosquet <scorlosquet@gmail.com>
Date: Fri, 6 Jan 2012 11:57:09 -0500
Message-ID: <CAGR+nnGwJJZt-YYXp-LpBmrb3C7LA5cunz9CtuGiO=f43=N2vQ@mail.gmail.com>
To: Damian Steer <pldms@mac.com>
Cc: Henry Story <henry.story@bblfish.net>, Jürgen Jakobitsch <j.jakobitsch@semantic-web.at>, "public-xg-webid@w3.org XG" <public-xg-webid@w3.org>
On Fri, Jan 6, 2012 at 9:57 AM, Damian Steer <pldms@mac.com> wrote:

> Hi Henry and Jürgen,
>
> On 06/01/12 12:49, Henry Story wrote:
>
> > Shellac's parser parses the xhtml correctly as xhtml in fact, but
> > when the html parser is used it comes to a different conclusion.
>
> Yes, this is becoming a classic issue, and has nothing to do with RDFa
> (although RDFa obscures the issue horribly).
>
> > RDFA 1 is defined in xhtml only I understand, so it is true that we
> > are going beyond what the spec by trying to parse html too. Perhaps
> > this will be a lot simplified with rdfa1.1 which can be made to work
> > with html5.
>
> Yes, RDFa 1.0 is only really defined for xhtml, although useful work was
> done on html 5 at the time (there are some html 5 tests). RDFa 1.1 does
> address html 5, but note that it doesn't change anything here.
>
> The problem is this:
>
>    <div rel="foaf:depiction" href="http://2sea.org/2sealogo.png"/>
>    <div rel="cert:key">
>        ...
>    </div>
>
> An xml parser sees a closed div, followed by another div. An html parser
> sees a broken div so repairs it as follows:
>
>    <div rel="foaf:depiction" href="http://2sea.org/2sealogo.png">
>      <div rel="cert:key">
>        ...
>      </div>
>    </div> <!-- close that div -->
>
> i.e. one div contains another now, and thus you find
>
> <http://2sea.org/2sealogo.png> cert:key ....
>
> I ought to add a utility to switch the parser based on content type,
> however in practice there's so much broken xhtml out there that tag soup
> parsing is much safer (although it does lead to issues like this).
>
> My advice would be to expect tag soup parsing in the wild and change the
> html:
>
>    <div rel="foaf:depiction" href="http://2sea.org/2sealogo.png"></div>
>

+1. In Drupal 7 RDFa we purposely didn't use the minimized version of
elements to ensure maximum compatibility. Here is the link to the rule we
used: http://www.w3.org/TR/xhtml1/#C_3. So elements are
always explicitly closed like this

<span rel="schema:url" resource="/event/drupalcamp-nyc"></span>

Steph.



>
> Hope this makes sense,
>
> Damian
>
>
Received on Friday, 6 January 2012 17:27:02 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Friday, 6 January 2012 17:27:04 GMT