- From: Dan Brickley <danbri@danbri.org>
- Date: Tue, 7 Dec 2010 17:06:14 +0100
- To: Paul Groth <pgroth@few.vu.nl>, Toby Inkster <mail@tobyinkster.co.uk>
- Cc: Damian Steer <pldms@mac.com>, www-archive@w3.org
(cc-ing www-archive so I can find these notes again...) Ok so just blundering around 2 or 3 related design spaces here, with skeleton of a use case... exploring http://buzzword.org.uk/2009/rdfa4/spec mixed with Paul's example. http://svn.foaf-project.org/foaftown/2010/prov/ ...has surf3.html which was derrived from paul's opmv experiments, and still uses that style of markup. Oh, I added in some more factual info (age, gender) since otherwise I don't see any value in triple provenance; a simple 'based in part on' chain would otherwise be fine. ./graphit.pl is a script using Toby's graph-naming RDFa perl parser nyt-example.html is an imaginary file about this (real) Kelly character which I imagine the NYT hosting. It says "here's some stuff which we sourced from Freebase, and here's some stuff we're just telling you. And pointers off to Freebase who have their own RDF and sourcing thing going on, we might hope. A side theme here is distinguishing static properties whose erm facticity doesn't change over time (dates of birth) from those that go predictably stale quite quickly (eg. age). So if we can crawl back the provenance trail to find dateOfBirth instead of age, that's kinda nice. rapper -i rdfa nyt-example.html http://nyt.example.com/people/kelly_slater/ rapper: Parsing URI file:///Users/danbri/working/foaf/foaftown/2010/prov/nyt-example.html with parser rdfa and base URI http://nyt.example.com/people/kelly_slater/ rapper: Serializing with serializer ntriples and base URI http://nyt.example.com/people/kelly_slater/ <http://nyt.example.com/people/kelly_slater/#id> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> . <http://nyt.example.com/people/kelly_slater/#id> <http://xmlns.com/foaf/0.1/name> "Kelly Slater" . <http://nyt.example.com/people/kelly_slater/#id> <http://xmlns.com/foaf/0.1/age> "38" . <http://nyt.example.com/people/kelly_slater/#id> <http://xmlns.com/foaf/0.1/dateOfBirth> "1972-02-11" . <http://nyt.example.com/people/kelly_slater/#id> <http://xmlns.com/foaf/0.1/gender> "male" . <http://nyt.example.com/people/kelly_slater/> <http://xmlns.com/foaf/0.1/primaryTopic> <http://nyt.example.com/people/kelly_slater/#id> . <http://nyt.example.com/people/kelly_slater/#from_freebase> <http://purl.org/dc/terms/source> <http://www.freebase.com/view/en/kelly_slater> . <http://www.nytimes.com/2009/09/18/sports/18surfing.html> <http://xmlns.com/foaf/0.1/primaryTopic> <http://nyt.example.com/people/kelly_slater/#id> . <http://www.nytimes.com/2010/11/14/sports/14surfing.html> <http://xmlns.com/foaf/0.1/primaryTopic> <http://nyt.example.com/people/kelly_slater/#id> . <http://www.nytimes.com/2006/08/20/sports/playmagazine/20slater-irons.html> <http://xmlns.com/foaf/0.1/topic> <http://nyt.example.com/people/kelly_slater/#id> . rapper -i rdfa surf3.html rapper: Parsing URI file:///Users/danbri/working/foaf/foaftown/2010/prov/surf3.html with parser rdfa rapper: Serializing with serializer ntriples <http://opmv.googlecode.com/svn/trunk/js/example/> <http://www.w3.org/1999/xhtml/vocab#stylesheet> <http://opmv.googlecode.com/svn/trunk/js/example/./style.css> . <http://opmv.googlecode.com/svn/trunk/js/example/> <http://www.w3.org/1999/xhtml/vocab#meta> <http://opmv.googlecode.com/svn/trunk/js/example/#> . <http://opmv.googlecode.com/svn/trunk/js/example/#quote> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.org/net/opmv/ns#Artifact> . _:bnode0 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> . _:bnode0 <http://xmlns.com/foaf/0.1/name> "Kelly Slater" . _:bnode0 <http://xmlns.com/foaf/0.1/age> "38" . <http://opmv.googlecode.com/svn/trunk/js/example/#aggregation> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.org/net/opmv/ns#Process> . <http://opmv.googlecode.com/svn/trunk/js/example/#aggregation> <http://purl.org/net/opmv/ns#used> <http://www.nytimes.com/2010/03/14/sports/14surf.html> . <http://opmv.googlecode.com/svn/trunk/js/example/#aggregation> <http://purl.org/net/opmv/ns#wasPerformedBy> "John Smith" . <http://opmv.googlecode.com/svn/trunk/js/example/#quote> <http://purl.org/net/opmv/ns#wasGeneratedBy> <http://opmv.googlecode.com/svn/trunk/js/example/#aggregation> . That's the flat triples view. But say we wanted to drill into the Web from the blog post, and figure out 'ok, so is this kelly really 38?' (apologies to Kelly if you're googling yourself btw), is graph= addition to RDFa any use? So the perl script in there runs Toby's stuff which partitions the above triples into different graphs/buckets. Useful? Well I don't know. Intriguing for sure. There are also lots of design options around what URIs to use, and complicated here 'cos I rigged my surf3.html to use Paul's repo as base href, so the css and image relateive links work. Here's how surf3.html partitions itself: TellyClub:prov danbri$ ./graphit.pl surf3.html # Graph URI: _:RDFaDefaultGraph <http://opmv.googlecode.com/svn/trunk/js/example/> <http://www.w3.org/1999/xhtml/vocab#stylesheet> <http://opmv.googlecode.com/svn/trunk/js/example/style.css> . # Graph URI: http://opmv.googlecode.com/svn/trunk/js/example/nyt-example.html [] a <http://xmlns.com/foaf/0.1/Person> ; <http://xmlns.com/foaf/0.1/age> "38" ; <http://xmlns.com/foaf/0.1/name> "Kelly Slater" . <http://opmv.googlecode.com/svn/trunk/js/example/#aggregation> <http://purl.org/net/opmv/ns#used> <http://www.nytimes.com/2010/03/14/sports/14surf.html> ; <http://purl.org/net/opmv/ns#wasPerformedBy> "John Smith" ; a <http://purl.org/net/opmv/ns#Process> . <http://opmv.googlecode.com/svn/trunk/js/example/#quote> <http://purl.org/net/opmv/ns#wasGeneratedBy> <http://opmv.googlecode.com/svn/trunk/js/example/#aggregation> ; a <http://purl.org/net/opmv/ns#Artifact> . ...this separates the triples it claims to have gotten from NYT from other stuff in the page. The graph URI is nyt-example.html so let's look at that now: TellyClub:prov danbri$ ./graphit.pl nyt-example.html # Graph URI: http://nyt.example.com/people/kelly_slater/#catalog <http://www.nytimes.com/2006/08/20/sports/playmagazine/20slater-irons.html> <http://xmlns.com/foaf/0.1/topic> <http://nyt.example.com/people/kelly_slater/#id> . <http://www.nytimes.com/2009/09/18/sports/18surfing.html> <http://xmlns.com/foaf/0.1/primaryTopic> <http://nyt.example.com/people/kelly_slater/#id> . <http://www.nytimes.com/2010/11/14/sports/14surfing.html> <http://xmlns.com/foaf/0.1/primaryTopic> <http://nyt.example.com/people/kelly_slater/#id> . # Graph URI: http://nyt.example.com/people/kelly_slater/#from_freebase <http://nyt.example.com/people/kelly_slater/> <http://xmlns.com/foaf/0.1/primaryTopic> <http://nyt.example.com/people/kelly_slater/#id> . <http://nyt.example.com/people/kelly_slater/#from_freebase> <http://purl.org/dc/terms/source> <http://www.freebase.com/view/en/kelly_slater> . <http://nyt.example.com/people/kelly_slater/#id> a <http://xmlns.com/foaf/0.1/Person> ; <http://xmlns.com/foaf/0.1/age> "38" ; <http://xmlns.com/foaf/0.1/dateOfBirth> "1972-02-11" ; <http://xmlns.com/foaf/0.1/gender> "male" ; <http://xmlns.com/foaf/0.1/name> "Kelly Slater" . ...again this separates things that the NYT is supposedly telling us (eg. metadata about its catalogue of articles) from facts it associates (via dc:source here) with Freebase. Now both of these pages could be lying or mistaken of course, as could Freebase. The appeal I see with partitioning the RDFa into graph'd chunks is that we can associate a dc:source with each bit of info. I'm not entirely show how OPMV fits in here, but that's not suprising as I've plenty of reading left to do. So re named graph URIs, we'd probably not want to use the proposed URIs directly when loading into a quad store, and use some generated uuid or whatever instead, so that mischievous names would be harmless. But this does seem to suggest ways of pointing back down the chain to source files, and maybe also detecting loops even? Easy to imagine a Wikipedia page acquiring a 'source' pointer to the NYT article, not realising that it was sourced from Freebase which got it at Wikipedia in the first place. So this is all intriguing but also gives me the feeling it might be a bit fragile... </thinking_out_loud> cheers, Dan ps. I had some similar experiment last year, http://svn.foaf-project.org/foaftown/2009/headstream/readme.txt ... which was about separating the things some social site says about the user (and may have generated w/ stats, fact checked etc) from the things they say about themselves. In the absense of named graph RDFa I used SPARQL constructs to implement the partitioning. Kinda worked.
Received on Tuesday, 7 December 2010 16:06:49 UTC