Comments on the Provenance Last Call, for discussion.

I took an action to read and comment upon the Provenance Last Call drafts. (, with a particular focus on the issues raised in their email regarding the "bundle" and "mention" constructs. I have read the documents, rather quickly, and could have many comments on them; but restricting myself to WG relevant business, this is a preliminary report for discussion tomorrow. 

"Bundle" is much easier to understand than "mention". I confess to not (yet) fully understanding what the document says about "mention". However, the notion of bundle is relevant to us by itself. 

A bundle is a named set of provenance statements. In the main document, this is a set of statements written in the special notation PROV-N, but when the entire provenance system is embedded into RDF (via OWL2-RDF), these become named sets of RDF triples, i.e. named RDF graphs.  Two points arise immediately which are relevant to our discussions about named graphs. 

1. The name really does identify the actual RDF graph, not a "labile graph" or a "graph container" or any other changeable thingie. Quoting from [1]:
"a Bundle of PROV-O assertions is an abstract set of RDF triples, and adding or removing a triple creates a distinct Bundle of PROV-O assertions."

2. The identifier of the bundle really is understood to refer to the bundle, so that when it  the identifier  is used elsewhere in other PROV-O RDF assertions, it refers to this bundle and not to anything else. This is because the entire point of having bundles in PROV is to allow provenance assertions to themselves be given a provenance. These graph names are intended to be used in published RDF which may get archived for future use, and its identifiers are intended to be cool enough to be stable. Bundles are first-class enttites which can have properties, be mentioned in other documents and so on. In particular, they can be put into classes: that is, there can be classes of particular kinds of RDF named graph, and a given graph can be asserted to be in one of these classes. (PROV-O seems to require this, since while not every named graph is a Prov-O bundle, every Prov-O bundle is a named RDF graph.) 


OK, now for "mention". This is marked as "at risk", and I find it very hard to understand, but here's the definition from [2]:

"An entity e1 may be mentioned in a bundle b, which contains some descriptions about this entity e1: how e1 was generated and used, which activities e1 is involved with, the agents e1 is attributed to, etc. Other bundles may contain other descriptions about the same entity e1. Some applications may want to interpret this entity e1 with respect to the descriptions found in the bundle b it occurs in. To this end, PROV allows a new entity e2 to be created, which is a specialization of the original entity e1, and which presents an additional aspect: the bundle b containing some descriptions of e1. With this relation, applications that process e2 can know that the attributes of e2 may have been computed according to the descriptions of e1 in b."

Got that? 

In order to understand this, you have to be willing to drink a certain metaphysical kool-aid that seems to be incorporated into the PROV framework, which is the idea that entities, ie things in general (they deftly avoid the r-word), are different when 'interpreted' differently, or perhaps when viewed 'with respect to' different other things. Thus, PatHayes-as-seen-by-Wikipedia and PatHayes-as-seen-by-Jackie-Hayes are *different* entities, but they are related in a special way, viz. they are both entities of (both about?) the same *thing*. (I hope I am getting this right.) So, suppose there is some bundle of information about PatHayes, for example the Wikipedia entry, then we can consider the entity (PatHayes as mentioned by Wikipedia) which incoporates the Wikipedia information into itself, as a unique new entity, about-the-same-thing related to me, but distinct from me considered as an entity (I think.) And this would be a MENTION of me which embodies the Wikipedia 'bundle' information about me. 

So, to sum up and ignoring the metaphysics for a moment, a mention in PROV-O is a trinary relationship between two things and a named RDF graph of a certain kind, which holds only when some triple in the graph containe a URI referring to the first of the two things (the entity mentioned in the bundle). 

I really have no idea if this bears on our thoughts about named graphs in any special way. AFAIKS it should not, if we can get bundles right.

Hope this helps. Obviously there is a lot more that can be said about the PROV work, but I don't see any of it as being of particular concern to RDF.  (For example, I think the 'mention' idea will play utter havoc with OWL class reasoning, but that's not our problem.)



IHMC                                     (850)434 8903 or (650)494 3973   
40 South Alcaniz St.           (850)202 4416   office
Pensacola                            (850)202 4440   fax
FL 32502                              (850)291 0667   mobile

Received on Wednesday, 8 August 2012 02:21:32 UTC