- From: Philip Taylor <pjt47@cam.ac.uk>
- Date: Sat, 12 Sep 2009 17:07:20 +0100
- To: RDFa mailing list <public-rdf-in-xhtml-tf@w3.org>
As a followup to the old news linked from <http://lists.w3.org/Archives/Public/public-rdf-in-xhtml-tf/2009May/0064.html>: Google has now made available a testing tool at <http://www.google.com/webmasters/tools/richsnippets>. As far as I'm aware it's using the same code that the real search engine results use. I tested it a bit, and it seems that what's implemented in that tool bears very little relation to RDFa. It's not simply a buggy implementation - it's not even attempting to handle RDFa remotely correctly. http://philip.html5.org/demos/rdfa/google-rich-snippets.html shows a few examples. It rejects some perfectly correct RDFa markup; it interprets some perfectly correct RDFa markup incorrectly; and it accepts some totally broken RDFa markup. For example, the documentation at http://www.google.com/support/webmasters/bin/answer.py?answer=146646 includes: <a href="http://darryl-blog.example.com/" rel="v:friend">Darryl</a> Google's tool says the output has "friend = Darryl", whereas RDFa says to ignore the element content and output a triple "... <http://rdf.data-vocabulary.org/#friend> <http://darryl-blog.example.com/>" instead, so the markup is being interpreted incorrectly. With input like <span property="v:name" datatype="">John <span property="v:nickname">Smith</span></span>, Google's tool only extracts the name and ignores the nickname triple that an RDFa processor would generate, so it's again failing to interpret the markup correctly. With input like <span property="v:name" content="John Smith">error</span>, it returns "name = error". So it seems to totally ignore attributes like 'datatype' and 'content', and treats 'rel' identically to 'property', as far as I can tell. Also, the tool accepts input like: <div xmlns:v="http://rdf.data-vocabulary.org/#" typeof="v:Person"> <span property="v:name">John Smith</span> </div> while it rejects equivalent input like: <div xmlns:v="http://rdf.data-vocabulary.or" typeof="v:g/#Person"> <span property="v:g/#name">John Smith</span> </div> It also accepts input like: <div xmlns:v="http://arbitrary.example.org/#" typeof="v:Person"> <span property="v:name">John Smith</span> </div> and apparently entirely ignores that it's in a different namespace, and processes the data as if it were in "http://rdf.data-vocabulary.org/#" (it still shows up in the search result preview regardless of namespace, as long as you have the right string after the colon). It also accepts input like: <div typeof="zzz:Person"> <span property="#:name">John Smith</span> </div> and emits a warning about the undeclared namespaces but otherwise processes it as if it were all using the correct namespace. So it seems that Google doesn't attempt to do any kind of namespace/CURIE processing at all (other than a little bit for the harmless warning) - it simply looks at the part of the attribute value after the colon (case-insensitively), and ignores everything else. Am I doing something wrong here, or am I missing a good reason for this apparent behaviour? It seems very disappointing that Google is claiming to support RDFa while failing to implement it in a way that is remotely correct or compatible with other RDFa processors. -- Philip Taylor pjt47@cam.ac.uk
Received on Saturday, 12 September 2009 16:08:03 UTC