- From: Philip Taylor <pjt47@cam.ac.uk>
- Date: Tue, 15 Sep 2009 17:09:09 +0100
- To: Othar Hansson <othar@othar.com>
- CC: RDFa mailing list <public-rdf-in-xhtml-tf@w3.org>
Othar Hansson wrote on 2009-09-12: > Thanks for the bug reports. Thanks for your response! (Doesn't seem to have reached the CC'd public-rdf-in-xhtml-tf list yet, so I'll quote it in full here.) > We should be clearer about the purpose of the preview tool. It's to > give webmasters a preview of the rich snippet that we would produce > based on the data found on the page. As an aid to debugging, we show > what we parsed from the page. We should point people elsewhere if > they want full RDFa validation. That purpose is fine - I don't expect it to be a full RDFa processor displaying output triples or anything like that. But I expect the data it extracts from a page (as used when generating the snippet preview, and shown in the debugging output) to be 'compatible' with a conforming RDFa processor, in the sense that the same data can (in theory) be derived entirely by applying some transformation to the RDF triples generated by a conforming RDFa processor. Firstly, if Google's processor extracts data from a page that is not extracted by a real RDFa processor, then people will write pages with incorrect/invalid RDFa (e.g. they might use a wrong namespace URI, like Google's own documentation did when it was first released), test it in Google's tool, see that the output is correct, and think that everything is fine and that they're supporting the RDFa standard. The rest of the RDF community, using real RDFa processors, will be unable to parse and use that incorrectly marked-up data. As I see it, the purpose of a standard like RDFa is to ensure interoperability between producers and consumers in order to maximise the amount of data that can be extracted from the web, and this is compromised if some people break interoperability by doing things differently (particularly if it's someone prominent like Google with significant influence over content producers). Secondly, if Google's processor extracts data from one page but fails on another page, when those pages are equivalent from the perspective of RDFa (generating the same set of RDF triples), then it may work for people who copy-and-paste examples from the documentation but it will mislead and confuse producers who actually understand RDFa. They will write something that works, modify it in a way that the RDFa documentation and RDFa tools say should make no difference (e.g. moving some text into a @content attribute, or using the <a xmlns:http="http:" rel="http://www..."> trick to make CURIEs that look like full URIs), and it will unexpectedly break in Google's processor. Instead they'll have to learn a new (and currently undocumented) syntax that is the intersection of what RDFa and Google support, making it much more complex and more restrictive than if Google supported RDFa in a compatible way. I'm not personally a proponent of RDFa and I don't have any strong feelings against Google using proprietary or non-RDFa markup (or proper RDFa) for this kind of thing; I just don't like it being promoted (by Google and by RDFa supporters) as "RDFa" when it suffers from these problems due to disregarding the standard, and it seems to me (after looking into the details) that it will hurt the RDFa community if the problems are not resolved. At least with proprietary markup, someone could write a tool that parses the data into RDF alongside a normal RDFa parser, and run both parsers over arbitrary web pages (which would be a bit of a pain but would be possible). As it is now, it's impossible to write a tool that extracts the same data as Google without violating the RDFa spec and generating incorrect output from some valid RDFa pages. > We surely have errors in our parsing (thanks for finding several: > we'll look into these on Monday). But we will also deviate from the > standard in some cases to be forgiving of webmaster errors. For > example, we expect that some webmasters will forget the xmlns > attribute entirely. "we will [...] deviate from the standard" makes me believe that the above problems are an unavoidable consequence of Google's intentions, rather than just unintentional transient fixable bugs, and therefore are a serious concern (which is why I'm writing about it like this rather than just listing bugs). Are you going to propose (or have you already proposed) these deviations as an update to the RDFa Recommendation (or as a new competing standard, or at least as a Google-hosted specification so it's documented somewhere)? (I'm mostly just a bystander, not an active participant in anything RDFa-related, so I might have missed some existing discussions about this.) If not, it seems like an unexpected disregard for standards and interoperability. So I'm hoping that's not the case! > --Othar > (@google) -- Philip Taylor pjt47@cam.ac.uk
Received on Tuesday, 15 September 2009 16:09:49 UTC