- From: Graham Klyne <graham.klyne@zoo.ox.ac.uk>
- Date: Thu, 20 Oct 2011 14:18:02 +0100
- To: W3C provenance WG <public-prov-wg@w3.org>
I was aiming to draft something about entity attributes to flesh out the relationship between resources and provenance in PAQ, but ran into some devil-in-detail problems, which I'm trying to explore here... This message arises in part from an off-line discussion with Stian, whom I thank for pointing me to important information about how the provenance model plays out in RDF, and for explaining some of the implications of this. However, any errors and misapprehensions in what follows are all mine. == Provenance and properties of entities in RDF == I've been looking at how provenance expressions may be represented in RDF, and how such representation interacts with attributes of an entity. For the purpose of this discussion, I'll use a statement using dcterms:creator as an example: ex:aDocument a prov:Entity ; dcterms:creator "Meritorious Meerkat" . The RDF statement with property dcterms:creator can be interpreted as an attribute of the entity, *and* as an expression of provenance about the entity. To express the above as provenance using the provenance vocabulary as currently defined, we need to introduce a new class, a subclass of prov:ProcessExecution; e.g. ex:DocumentCreation rdfs:subclassOf prov:ProcessExecution . ex:aDocument a prov:Entity ; prov:wasGeneratedBy [ a ex:DocumentCreation ; prov:wasControlledBy [ a prov:Agent ; foaf:name "Meritorious Meerkat" ] ] . I observe: (a) this structure is quite similar to the sort of event-mediated structures that occur when using CIDOC-CRM [1]. (b) the structure is quite complex compared with the original example. [1] http://www.cidoc-crm.org/docs/fin-paper.pdf I'm not saying these are problems, but I am trying to explore the landscape from an implementer's perspective. I think it is probably reasonable that applications with a special interest in generating and/or consuming provenance information - workflow enactment systems come to mind - may reasonably generate and work with the more complex format (though my experience with using CIDOC-CRM in RDF suggests that some additional steps may be needed if processing of this data is to scale - but I don't see that as a primary concern at this juncture). My main concerns are that we also want to be able to capture and use provenance information that is generated incidentally by applications that don't have a primary interest in provenance, and the provenance information should similarly be accessible to applications that don't care for the intricacies of provenance information. Such applications would easily generate and consume statements like the original using dcterms:creator, but may be less able to deal with the more complex provenance vocabulary structures. In my mind, this raises the following questions: (1) is the full complexity of the current provenance model structure actually needed? I think it probably is, but I feel it's worth reflecting and asking the question. (2) should we look to technical mechanisms to define the relationship between the simple provenance-as-attributes and fully-modeled provenance statements? (E.g., relating the two examples given above.) (3) rather than defining an all-new vocabulary, should we consider basing the mapping of the abstract model to RDF on a subset of the CIDOC-CRM model structures? (I don't think this would affect PROV-DM, but could affect many of the terms used in PROV-O, and cause some of the mapped structures in RDF to change.) At the very least, and I think this echos Ivan Herman's recent email to the group [2], I think we need to find a way to make it clear how the simple attributes can be related to the defined provenance model, and maybe provide some guidelines to help provenance-aware applications to interpret and/or generate simple attributes that happen to express provenance information. [2] http://lists.w3.org/Archives/Public/public-prov-wg/2011Oct/0140.html #g
Received on Thursday, 20 October 2011 13:20:17 UTC