W3C home > Mailing lists > Public > public-xg-webid@w3.org > January 2012

Re: rdfa parsing issue -- was: fixed https://foafssl.org/test/WebId

From: Kingsley Idehen <kidehen@openlinksw.com>
Date: Fri, 06 Jan 2012 18:02:56 -0500
Message-ID: <4F077DA0.7050902@openlinksw.com>
To: public-xg-webid@w3.org
On 1/6/12 1:37 PM, Henry Story wrote:
> yes, well this really should be documented on our wiki. This is the 
> kind of thing that
> is going to turn up again and again. I wonder even if it should be in 
> the spec as a note.

Why?

What does it have to do with the spec? Again, you are bringing parsing 
into a spec that isn't about parsing. Make a side note in the Wiki for 
people that are using your approach and related libraries and tools.

Kingsley
>
> Henry
>
>
> On 6 Jan 2012, at 17:57, Stéphane Corlosquet wrote:
>
>>
>>
>> On Fri, Jan 6, 2012 at 9:57 AM, Damian Steer <pldms@mac.com 
>> <mailto:pldms@mac.com>> wrote:
>>
>>     Hi Henry and Jürgen,
>>
>>     On 06/01/12 12:49, Henry Story wrote:
>>
>>     > Shellac's parser parses the xhtml correctly as xhtml in fact, but
>>     > when the html parser is used it comes to a different conclusion.
>>
>>     Yes, this is becoming a classic issue, and has nothing to do with
>>     RDFa
>>     (although RDFa obscures the issue horribly).
>>
>>     > RDFA 1 is defined in xhtml only I understand, so it is true that we
>>     > are going beyond what the spec by trying to parse html too. Perhaps
>>     > this will be a lot simplified with rdfa1.1 which can be made to
>>     work
>>     > with html5.
>>
>>     Yes, RDFa 1.0 is only really defined for xhtml, although useful
>>     work was
>>     done on html 5 at the time (there are some html 5 tests). RDFa
>>     1.1 does
>>     address html 5, but note that it doesn't change anything here.
>>
>>     The problem is this:
>>
>>     <div rel="foaf:depiction" href="http://2sea.org/2sealogo.png"/>
>>     <div rel="cert:key">
>>            ...
>>     </div>
>>
>>     An xml parser sees a closed div, followed by another div. An html
>>     parser
>>     sees a broken div so repairs it as follows:
>>
>>     <div rel="foaf:depiction" href="http://2sea.org/2sealogo.png">
>>     <div rel="cert:key">
>>            ...
>>     </div>
>>     </div> <!-- close that div -->
>>
>>     i.e. one div contains another now, and thus you find
>>
>>     <http://2sea.org/2sealogo.png> cert:key ....
>>
>>     I ought to add a utility to switch the parser based on content type,
>>     however in practice there's so much broken xhtml out there that
>>     tag soup
>>     parsing is much safer (although it does lead to issues like this).
>>
>>     My advice would be to expect tag soup parsing in the wild and
>>     change the
>>     html:
>>
>>     <div rel="foaf:depiction" href="http://2sea.org/2sealogo.png"></div>
>>
>>
>> +1. In Drupal 7 RDFa we purposely didn't use the minimized version of 
>> elements to ensure maximum compatibility. Here is the link to the 
>> rule we used: http://www.w3.org/TR/xhtml1/#C_3. So elements are 
>> always explicitly closed like this
>>
>> <span rel="schema:url" resource="/event/drupalcamp-nyc"></span>
>>
>> Steph.
>>
>>
>>     Hope this makes sense,
>>
>>     Damian
>>
>>
>
> Social Web Architect
> http://bblfish.net/
>


-- 

Regards,

Kingsley Idehen	
Founder&  CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca handle: @kidehen
Google+ Profile: https://plus.google.com/112399767740508618350/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen








Received on Friday, 6 January 2012 23:03:20 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Friday, 6 January 2012 23:03:22 GMT