Re: rdfa parsing issue -- was: fixed https://foafssl.org/test/WebId from Henry Story on 2012-01-06 (public-xg-webid@w3.org from January 2012)

From: Henry Story <henry.story@bblfish.net>
Date: Fri, 6 Jan 2012 21:10:58 +0100
To: Damian Steer <pldms@mac.com>
Cc: Jürgen Jakobitsch <j.jakobitsch@semantic-web.at>, "public-xg-webid@w3.org XG" <public-xg-webid@w3.org>
Message-Id: <28671BAD-AF41-44FC-A316-71F27295782B@bblfish.net>

Thanks Damian,

  that was very helpful. 

I have now fixed a couple of issues on my side now, and I see that Jürgen has updated his xhtml even to be closer to xhtml. So the foafssl.org tester should work with that resource in any case.

Btw, I get the following in the logs

 INF: [console logger] dispatch: 2sea.org GET /sea.jsp HTTP/1.1
ERROR [pool-3-thread-5] (RDFDefaultErrorHandler.java:40) - http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd(line 106 column 22): {E213} Unexpected end of file from server

It looks like the RDFa parser is following the DTDs. Is there a way to stop that? I guess the W3C does not serve those files.

Henry


On 6 Jan 2012, at 15:57, Damian Steer wrote:

> Hi Henry and Jürgen,
> 
> On 06/01/12 12:49, Henry Story wrote:
> 
>> Shellac's parser parses the xhtml correctly as xhtml in fact, but 
>> when the html parser is used it comes to a different conclusion.
> 
> Yes, this is becoming a classic issue, and has nothing to do with RDFa
> (although RDFa obscures the issue horribly).
> 
>> RDFA 1 is defined in xhtml only I understand, so it is true that we
>> are going beyond what the spec by trying to parse html too. Perhaps
>> this will be a lot simplified with rdfa1.1 which can be made to work
>> with html5.
> 
> Yes, RDFa 1.0 is only really defined for xhtml, although useful work was
> done on html 5 at the time (there are some html 5 tests). RDFa 1.1 does
> address html 5, but note that it doesn't change anything here.
> 
> The problem is this:
> 
>    <div rel="foaf:depiction" href="http://2sea.org/2sealogo.png"/>
>    <div rel="cert:key">
> 	...
>    </div>
> 
> An xml parser sees a closed div, followed by another div. An html parser
> sees a broken div so repairs it as follows:
> 
>    <div rel="foaf:depiction" href="http://2sea.org/2sealogo.png">
>      <div rel="cert:key">
>        ...
>      </div>
>    </div> <!-- close that div -->
> 
> i.e. one div contains another now, and thus you find
> 
> <http://2sea.org/2sealogo.png> cert:key ....
> 
> I ought to add a utility to switch the parser based on content type,
> however in practice there's so much broken xhtml out there that tag soup
> parsing is much safer (although it does lead to issues like this).
> 
> My advice would be to expect tag soup parsing in the wild and change the
> html:
> 
>    <div rel="foaf:depiction" href="http://2sea.org/2sealogo.png"></div>
> 
> Hope this makes sense,
> 
> Damian

Social Web Architect
http://bblfish.net/

Received on Friday, 6 January 2012 20:18:28 UTC