- From: Danny Ayers <danny.ayers@gmail.com>
- Date: Fri, 20 Apr 2007 11:50:07 +0200
- To: linking-open-data@simile.mit.edu
- Cc: www-archive@w3.org
A dataset that might be useful for test/experimental purposes - my personal blog, from early 2003 to the present: http://dannyayers.com:88/data/raw_2007-04-20.rdf.gz Rapper reckons 289847 triples. Main vocab is RSS 1.0. It includes visitor comments (with a proportion of spam), Knobot system-specific statements, some FOAF, various other bits & pieces. Although it's valid RDF/XML in other respects it's as rough as can be. The content is tag soup HTML. Since I started self-hosting the blog I've switched CMS twice - initially it was Movable Type, then WordPress, now Knobot. If I remember correctly the MT/WP transition 1970'd a lot of the dates, but the raw data is still in there somewhere. One particular challenge re. exposing this via SPARQL or whatever is that it also contains some email addresses in plain text - these need be hidden from spammer's harvesters. License - CC Attribution (i.e. link appreciated if you use the stuff) http://creativecommons.org/licenses/by/2.5/ There's 2002-2003 blog data at: http://semtext.org/semblog via Blogger - but I've yet to get a dump. (Too busy blogging ;-) Cheers, Danny. -- http://dannyayers.com
Received on Friday, 20 April 2007 09:50:17 UTC