Testing Google's Rich Snippets RDFa support from Philip Taylor on 2009-09-12 (public-rdf-in-xhtml-tf@w3.org from September 2009)

From: Philip Taylor <pjt47@cam.ac.uk>
Date: Sat, 12 Sep 2009 17:07:20 +0100
To: RDFa mailing list <public-rdf-in-xhtml-tf@w3.org>
Message-ID: <4AABC738.4000802@cam.ac.uk>
As a followup to the old news linked from 
<http://lists.w3.org/Archives/Public/public-rdf-in-xhtml-tf/2009May/0064.html>: 
Google has now made available a testing tool at 
<http://www.google.com/webmasters/tools/richsnippets>. As far as I'm 
aware it's using the same code that the real search engine results use.

I tested it a bit, and it seems that what's implemented in that tool 
bears very little relation to RDFa. It's not simply a buggy 
implementation - it's not even attempting to handle RDFa remotely correctly.

http://philip.html5.org/demos/rdfa/google-rich-snippets.html shows a few 
examples. It rejects some perfectly correct RDFa markup; it interprets 
some perfectly correct RDFa markup incorrectly; and it accepts some 
totally broken RDFa markup.

For example, the documentation at 
http://www.google.com/support/webmasters/bin/answer.py?answer=146646 
includes:

   <a href="http://darryl-blog.example.com/" rel="v:friend">Darryl</a>

Google's tool says the output has "friend = Darryl", whereas RDFa says 
to ignore the element content and output a triple "... 
<http://rdf.data-vocabulary.org/#friend> 
<http://darryl-blog.example.com/>" instead, so the markup is being 
interpreted incorrectly.

With input like <span property="v:name" datatype="">John <span 
property="v:nickname">Smith</span></span>, Google's tool only extracts 
the name and ignores the nickname triple that an RDFa processor would 
generate, so it's again failing to interpret the markup correctly.

With input like <span property="v:name" content="John 
Smith">error</span>, it returns "name = error".

So it seems to totally ignore attributes like 'datatype' and 'content', 
and treats 'rel' identically to 'property', as far as I can tell.


Also, the tool accepts input like:

   <div xmlns:v="http://rdf.data-vocabulary.org/#" typeof="v:Person">
     <span property="v:name">John Smith</span>
   </div>

while it rejects equivalent input like:

   <div xmlns:v="http://rdf.data-vocabulary.or" typeof="v:g/#Person">
     <span property="v:g/#name">John Smith</span>
   </div>

It also accepts input like:

   <div xmlns:v="http://arbitrary.example.org/#" typeof="v:Person">
     <span property="v:name">John Smith</span>
   </div>

and apparently entirely ignores that it's in a different namespace, and 
processes the data as if it were in "http://rdf.data-vocabulary.org/#" 
(it still shows up in the search result preview regardless of namespace, 
as long as you have the right string after the colon).

It also accepts input like:

   <div typeof="zzz:Person">
     <span property="#:name">John Smith</span>
   </div>

and emits a warning about the undeclared namespaces but otherwise 
processes it as if it were all using the correct namespace.

So it seems that Google doesn't attempt to do any kind of 
namespace/CURIE processing at all (other than a little bit for the 
harmless warning) - it simply looks at the part of the attribute value 
after the colon (case-insensitively), and ignores everything else.


Am I doing something wrong here, or am I missing a good reason for this 
apparent behaviour? It seems very disappointing that Google is claiming 
to support RDFa while failing to implement it in a way that is remotely 
correct or compatible with other RDFa processors.

-- 
Philip Taylor
pjt47@cam.ac.uk
Received on Saturday, 12 September 2009 16:08:03 UTC