Re: rdfa parsing issue -- was: fixed https://foafssl.org/test/WebId from Henry Story on 2012-01-06 (public-xg-webid@w3.org from January 2012)

From: Henry Story <henry.story@bblfish.net>
Date: Sat, 7 Jan 2012 00:39:19 +0100
To: Kingsley Idehen <kidehen@openlinksw.com>
Cc: public-xg-webid@w3.org
Message-Id: <F1E1ABB8-521B-45BC-804A-372FA4C59FE3@bblfish.net>
On 7 Jan 2012, at 00:02, Kingsley Idehen wrote:

> On 1/6/12 1:37 PM, Henry Story wrote:
>> 
>> yes, well this really should be documented on our wiki. This is the kind of thing that 
>> is going to turn up again and again. I wonder even if it should be in the spec as a note.
> 
> Why? 
> 
> What does it have to do with the spec? Again, you are bringing parsing into a spec that isn't about parsing. Make a side note in the Wiki for people that are using your approach and related libraries and tools. 

yes, it was a dramatical exaggeration. There are things we should document.

Is there someone who could be our wiki master? That is someone who helps organise and make sure that good ideas are put into the wiki even if only just to point to the e-mails, and who can help guide people to fill out the details? I think that would be a useful role.

	Henry

> 
> Kingsley 
>> 
>> Henry
>> 
>> 
>> On 6 Jan 2012, at 17:57, Stéphane Corlosquet wrote:
>> 
>>> 
>>> 
>>> On Fri, Jan 6, 2012 at 9:57 AM, Damian Steer <pldms@mac.com> wrote:
>>> Hi Henry and Jürgen,
>>> 
>>> On 06/01/12 12:49, Henry Story wrote:
>>> 
>>> > Shellac's parser parses the xhtml correctly as xhtml in fact, but
>>> > when the html parser is used it comes to a different conclusion.
>>> 
>>> Yes, this is becoming a classic issue, and has nothing to do with RDFa
>>> (although RDFa obscures the issue horribly).
>>> 
>>> > RDFA 1 is defined in xhtml only I understand, so it is true that we
>>> > are going beyond what the spec by trying to parse html too. Perhaps
>>> > this will be a lot simplified with rdfa1.1 which can be made to work
>>> > with html5.
>>> 
>>> Yes, RDFa 1.0 is only really defined for xhtml, although useful work was
>>> done on html 5 at the time (there are some html 5 tests). RDFa 1.1 does
>>> address html 5, but note that it doesn't change anything here.
>>> 
>>> The problem is this:
>>> 
>>>    <div rel="foaf:depiction" href="http://2sea.org/2sealogo.png"/>
>>>    <div rel="cert:key">
>>>        ...
>>>    </div>
>>> 
>>> An xml parser sees a closed div, followed by another div. An html parser
>>> sees a broken div so repairs it as follows:
>>> 
>>>    <div rel="foaf:depiction" href="http://2sea.org/2sealogo.png">
>>>      <div rel="cert:key">
>>>        ...
>>>      </div>
>>>    </div> <!-- close that div -->
>>> 
>>> i.e. one div contains another now, and thus you find
>>> 
>>> <http://2sea.org/2sealogo.png> cert:key ....
>>> 
>>> I ought to add a utility to switch the parser based on content type,
>>> however in practice there's so much broken xhtml out there that tag soup
>>> parsing is much safer (although it does lead to issues like this).
>>> 
>>> My advice would be to expect tag soup parsing in the wild and change the
>>> html:
>>> 
>>>    <div rel="foaf:depiction" href="http://2sea.org/2sealogo.png"></div>
>>> 
>>> +1. In Drupal 7 RDFa we purposely didn't use the minimized version of elements to ensure maximum compatibility. Here is the link to the rule we used: http://www.w3.org/TR/xhtml1/#C_3. So elements are always explicitly closed like this 
>>> 
>>> <span rel="schema:url" resource="/event/drupalcamp-nyc"></span>
>>> 
>>> Steph.
>>> 
>>>  
>>> 
>>> Hope this makes sense,
>>> 
>>> Damian
>>> 
>>> 
>> 
>> Social Web Architect
>> http://bblfish.net/
>> 
> 
> 
> -- 
> 
> Regards,
> 
> Kingsley Idehen	      
> Founder & CEO 
> OpenLink Software     
> Company Web: http://www.openlinksw.com
> Personal Weblog: http://www.openlinksw.com/blog/~kidehen
> Twitter/Identi.ca handle: @kidehen
> Google+ Profile: https://plus.google.com/112399767740508618350/about
> LinkedIn Profile: http://www.linkedin.com/in/kidehen
> 
> 
> 
> 

Social Web Architect
http://bblfish.net/
Received on Friday, 6 January 2012 23:39:56 UTC