Re: Discussion with Ian and Henri about HTML5+RDFa (part 2/2)

Ben Adida wrote:
> Philip Taylor wrote:
> 
>> I should probably try downloading some more recent pages, to see if
>> CC/RDFa usage is more common now...
> 
> If you have time and resources to do so, that would be very useful!

I downloaded another 130K pages from dmoz.org (using their latest list 
of pages, so it's going to be differently biased to the previous set I 
used). Of those, 25 text/html pages (14 distinct sites) used what looked 
like CC RDFa.

(In total only 35 pages sent application/xhtml+xml (in response to an 
Accept header that matches Firefox 3), so I haven't bothered setting up 
my tools to parse those and I only looked at text/html.)

http://philip.html5.org/data/cc-rdfa-extracts-2.txt has the list of 
pages. Two have incorrect attributionURL (forgetting the "http://"). One 
has an incorrect license URL (inserting whitespace in the attribute 
value). Two have an attributionName that sounds to me like they didn't 
quite understand what it was for, though I could be mistaken. This is 
very little data to go on, but the error rate (in terms of being able to 
extract correct licensing data) seems to be around tens of percents.

>> Somewhat relatedly, there's another four pages that use rel="dc:type".
>> One of those (http://bytestrike.blogspot.com/) has it near a CC license
>> link and does not have an xmlns:dc declaration anywhere, suggesting a
>> copy-and-paste error.
> 
> Looks like it's broken in a more subtle way:
> 
> ====
> <span dc="http://purl.org/dc/elements/1.1/"
> href="http://purl.org/dc/dcmitype/Text" rel="dc:type">
> ====
> 
> it says "dc" instead of "xmlns:dc". Wonder how that happened.

http://ambienteecologico.blogspot.com/ has the same problem.

http://www.ratemyfish.com/ uses property="dc:*" without even attempting 
to declare "dc".

38 pages use some-rdfa-curie-attribute="dc:*" with the proper xmlns:dc.

The only other correctly-bound RDFa attribute values are 
xmlns:myspace="http://x.myspacecdn.com/modules/sitesearch/static/rdf/profileschema.rdf#", 
all on myspace.com

-- 
Philip Taylor
pjt47@cam.ac.uk

Received on Wednesday, 28 January 2009 01:17:24 UTC