- From: Stian Soiland-Reyes <soiland-reyes@cs.manchester.ac.uk>
- Date: Wed, 25 Apr 2012 10:07:00 +0100
- To: Paolo Missier <Paolo.Missier@ncl.ac.uk>
- Cc: Provenance Working Group WG <public-prov-wg@w3.org>
I have closed this issue, as https://dvcs.w3.org/hg/prov/raw-file/default/model/prov-constraints.html#collection-constraints now says that "PROV-DM does not provide an interpretation for descriptions that consist of two (or more) insertion, removal, membership relations that result in the same collection.". On Wed, Mar 7, 2012 at 09:53, Paolo Missier <Paolo.Missier@ncl.ac.uk> wrote: > Stian > > I missed this (and probably other) posts in the deluge of traffic from the > list. It's frankly becoming unwieldy. > > You can't disallow out of band manipulation of a data structure. All you can > do is interpret every statement about the current state of a structure > relative to the closed world defined by the set of available assertions. > Indeed, this is true of every other assertion. For instance, you can analyze > a graph and conclude that "entity e has been used twice", but you know it's > been used /at least/ twice. > For anything that is not observable, or gaps in the sequence, the text > points to wasDerivedFrom, as you note below, which of course is weaker. > So I propose to just make this CWA clear in the text, but I don' think you > can do much more than this. > That said, I will argue that the current framework is still sufficient in > settings where data structure manipulation is entirely observable. This is > true for the entire class of provenance about most workflow and program > executions, for example (including good old Taverna) which is one of the > main reasons for adding collection provenance in the first place, and a > long-time obsession of mine. Or a monotonically growing collection of tweets > that you see coming through a firehose (or a gardenhose?) about #Taverna, > for example. > > But I am happy to go through the text together again -- I will write > privately to make arrangements. > > --Paolo > > > > On 2/22/12 4:05 PM, Stian Soiland-Reyes wrote: >> >> This issue is still open. >> >> The DM says: >> >> >>> In particular, no assumptions are needed regarding the mutability of a >>> data structure that is subject to updates. In fact, the state of a >>> collection (i.e., the set of key-value pairs it contains) at a given point >>> in a sequence of operations is never stated explicitly. Rather, it can be >>> obtained by querying the chain of derivation assertions involving insertions >>> and removals. Entity type prov:type="prov:EmptyCollection"%%xsd:QName can be >>> used in this context as it marks the start of a sequence of collection >>> operations. >> >> >> But this is contradictory. If a collection is mutable and key-values >> can leak in or out without being recorded, then it is not possible to >> obtain the state of the collection by following insertion/removal >> assertions. >> >> >> I argue that allowing these leaks means that CollectionInsert/Removal >> are almost useless assertions with regards to asserting the state of a >> collection. All you know by CollectionAfterInsertion(c3, c2, k1, v1) >> is that it contains (k1,v1). Any knowledge asserted on c2 is >> irrelevant with regards to c3. as c2's whole content could have >> changed out of bounds. >> >> So therefore we should not allow out of bands insertions/removal in >> collections that are related using >> CollectionAfterInsertion/CollectionAfterRemoval. Collections are >> mutable, but prov:Collections are not. They are frozen in time. >> >> What we can say is that the complete state of a collection might not >> be known unless it is traceable back to an EmptyCollection using >> CollectionAfterInsertion and CollectionAfterRemoval - and that we >> don't allow CollectionAfterInsertion(c3, c2, k1, v1) to overwrite k1, >> and therefore c2 did not contain the key k1. >> >> >> >> This means that we can't use prov:Collections to cover use cases where >> there really are leaking things in and out which are not observed. >> >> For instance: >> Watching people entering a McDonalds at 9:00 pm and 10:00 pm >> >> agent(stian) >> CollectionAfterInsertion(peopleInMcDonaldsAt9pm, >> peopleInMcDonaldsAt8pm, stian, stian) >> // Stian exits through the back door, or while the observer is not >> watching >> CollectionAfterInsertion(peopleInMcDonaldsAt10pm, >> peopleInMcDonaldsAt9pm, stian, stian) >> >> I would argue that such observations should not be asserted as >> collections, because peopleInMcDonaldsAt10pm here is a meaningless >> collection, there could also be lots of other people coming and going >> between 9:00 and 10:00 pm - and so you know nothing else about its >> state, and the beforeCollection argument is not related to the >> afterCollection by anything but a mere wasDerivedFrom. >> >> > -- Stian Soiland-Reyes, myGrid team School of Computer Science The University of Manchester
Received on Wednesday, 25 April 2012 09:07:53 UTC