- From: Dave Reynolds <der@hplb.hpl.hp.com>
- Date: Wed, 27 Feb 2002 12:14:27 +0000
- To: "RDF Interest (E-mail)" <www-rdf-interest@w3.org>
We are working on a semantic web related application that needs some provenance support. We have various routes for doing this but would be interested in hearing of other's experiences. Are there any groups out there that have developed applications supporting provenance within RDF that would be willing to share their experiences on what worked well or badly? To explain a little. We are developing a semantic web application for shared information management. In this application users are able to attach personal metadata to items and are able to view the "soup" of metadata created by many users. For example the same item might have many different dc:title fields created by different users and the UI should be able to view this data and give response like 'most users call this "foo" but one user prefers to call it "bar"'. To support these we want fine grain tracking of where the multiple metadata values came from, down to the level of individual RDF assertions. The tracking data could include items like creator, date and digital-signature, these terms would be defined in a separate provenance schema/ontology. We are exploring three approaches to doing this - application level, reification and out-of-band. Each of these has pros and cons. ** Application level Treat provenance as a data modeling problem at the application level and introduce bNodes to which the provenance can be attached. Thus instead of: subj --pred--> obj for any provenanced (is that a word? :-) values use: subj --pred--> <> --rdf:value--> obj --pv:creator--> "Dave" --pv:date--> "27/2/02" This has the advantage of flexibility and means we can query provenance data conveniently using existing RDF query languages (RDQL in our case). However, as far as we know this is not a standard idiom and that might make it harder to interoperate with other RDF metadata sources. ** Reification Clearly the official RDF mechanism for representing provenance is to use reification and attach the same "pv:*" assertions to a node denoting the reified statement. This has the advantage of being the standard idiom at present, however the uncertain status of reification with the RDFCore WG leaves us nervous. We can still query provenance data, though the query would now look rather more ugly and verbose than if we take the application level approach. The shear number of triples needed is high but (a) is too early to optimize for performance and (b) we can in any case hide overhead by implementing a triple store which pretends to reify but in fact uses a more compact representation. ** Out of band In this option we simply make provenance support a property of the API. We don't change the RDF assertions in the main fact base at all. Instead we provide API calls to attach and retrieve annotations from any RDF assertion. This is related to the "quad" notion discussed on this list some time ago and the N3 approach that evey statement has an internal context attribute. This has the advantage that it hides the mechanics of provenance allowing us to keep the application code stable even if the implementation idiom changes. It has the disadvantage that we'd need to extend our query support to access this additional API layer and is at best unhelpful for integrating with other RDF data sources. For our current purposes we will simply pick one and work with it but if anyone else has already trodden this path and has experiences to share then we'd love to hear from them. Dave
Received on Wednesday, 27 February 2002 07:14:33 UTC