- From: Philip Taylor <pjt47@cam.ac.uk>
- Date: Wed, 28 Jan 2009 01:16:46 +0000
- To: Ben Adida <ben@adida.net>
- CC: public-rdf-in-xhtml-tf@w3.org, Dan Brickley <danbri@danbri.org>, Manu Sporny <msporny@digitalbazaar.com>, Ian Hickson <ian@hixie.ch>, Henri Sivonen <hsivonen@iki.fi>, Sam Ruby <rubys@intertwingly.net>
Ben Adida wrote: > Philip Taylor wrote: > >> I should probably try downloading some more recent pages, to see if >> CC/RDFa usage is more common now... > > If you have time and resources to do so, that would be very useful! I downloaded another 130K pages from dmoz.org (using their latest list of pages, so it's going to be differently biased to the previous set I used). Of those, 25 text/html pages (14 distinct sites) used what looked like CC RDFa. (In total only 35 pages sent application/xhtml+xml (in response to an Accept header that matches Firefox 3), so I haven't bothered setting up my tools to parse those and I only looked at text/html.) http://philip.html5.org/data/cc-rdfa-extracts-2.txt has the list of pages. Two have incorrect attributionURL (forgetting the "http://"). One has an incorrect license URL (inserting whitespace in the attribute value). Two have an attributionName that sounds to me like they didn't quite understand what it was for, though I could be mistaken. This is very little data to go on, but the error rate (in terms of being able to extract correct licensing data) seems to be around tens of percents. >> Somewhat relatedly, there's another four pages that use rel="dc:type". >> One of those (http://bytestrike.blogspot.com/) has it near a CC license >> link and does not have an xmlns:dc declaration anywhere, suggesting a >> copy-and-paste error. > > Looks like it's broken in a more subtle way: > > ==== > <span dc="http://purl.org/dc/elements/1.1/" > href="http://purl.org/dc/dcmitype/Text" rel="dc:type"> > ==== > > it says "dc" instead of "xmlns:dc". Wonder how that happened. http://ambienteecologico.blogspot.com/ has the same problem. http://www.ratemyfish.com/ uses property="dc:*" without even attempting to declare "dc". 38 pages use some-rdfa-curie-attribute="dc:*" with the proper xmlns:dc. The only other correctly-bound RDFa attribute values are xmlns:myspace="http://x.myspacecdn.com/modules/sitesearch/static/rdf/profileschema.rdf#", all on myspace.com -- Philip Taylor pjt47@cam.ac.uk
Received on Wednesday, 28 January 2009 01:17:24 UTC