- From: Damian Steer <pldms@mac.com>
- Date: Fri, 06 Jan 2012 14:57:04 +0000
- To: Henry Story <henry.story@bblfish.net>
- CC: Jürgen Jakobitsch <j.jakobitsch@semantic-web.at>, "public-xg-webid@w3.org XG" <public-xg-webid@w3.org>
Hi Henry and Jürgen, On 06/01/12 12:49, Henry Story wrote: > Shellac's parser parses the xhtml correctly as xhtml in fact, but > when the html parser is used it comes to a different conclusion. Yes, this is becoming a classic issue, and has nothing to do with RDFa (although RDFa obscures the issue horribly). > RDFA 1 is defined in xhtml only I understand, so it is true that we > are going beyond what the spec by trying to parse html too. Perhaps > this will be a lot simplified with rdfa1.1 which can be made to work > with html5. Yes, RDFa 1.0 is only really defined for xhtml, although useful work was done on html 5 at the time (there are some html 5 tests). RDFa 1.1 does address html 5, but note that it doesn't change anything here. The problem is this: <div rel="foaf:depiction" href="http://2sea.org/2sealogo.png"/> <div rel="cert:key"> ... </div> An xml parser sees a closed div, followed by another div. An html parser sees a broken div so repairs it as follows: <div rel="foaf:depiction" href="http://2sea.org/2sealogo.png"> <div rel="cert:key"> ... </div> </div> <!-- close that div --> i.e. one div contains another now, and thus you find <http://2sea.org/2sealogo.png> cert:key .... I ought to add a utility to switch the parser based on content type, however in practice there's so much broken xhtml out there that tag soup parsing is much safer (although it does lead to issues like this). My advice would be to expect tag soup parsing in the wild and change the html: <div rel="foaf:depiction" href="http://2sea.org/2sealogo.png"></div> Hope this makes sense, Damian
Received on Friday, 6 January 2012 14:58:20 UTC