W3C home > Mailing lists > Public > public-prov-wg@w3.org > March 2012

Re: PROV-ISSUE-137 (collection-isolation): Collection assertions does not guarantee isolation [Data Model]

From: Paolo Missier <Paolo.Missier@ncl.ac.uk>
Date: Wed, 07 Mar 2012 09:53:20 +0000
Message-ID: <4F573010.7040704@ncl.ac.uk>
To: Stian Soiland-Reyes <soiland-reyes@cs.manchester.ac.uk>
CC: Provenance Working Group WG <public-prov-wg@w3.org>
Stian

I missed this (and probably other) posts in the deluge of traffic from the list. It's frankly becoming unwieldy.

You can't disallow out of band manipulation of a data structure. All you can do is interpret every statement about the current state 
of a structure relative to the closed world defined by the set of available assertions. Indeed, this is true of every other 
assertion. For instance, you can analyze a graph and conclude that "entity e has been used twice", but you know it's been used /at 
least/ twice.
For anything that is not observable, or gaps in the sequence, the text points to wasDerivedFrom, as you note below, which of course 
is weaker.
  So I propose to just make this CWA clear in the text, but I don' think you can do much more than this.
   That said, I will argue that the current framework is still sufficient in settings where data structure manipulation is entirely 
observable. This is true for the entire class of provenance about most workflow and program executions, for example (including good 
old Taverna) which is one of the main reasons for adding collection provenance in the first place, and a long-time obsession of 
mine. Or a monotonically growing collection of tweets that you see coming through a firehose (or a gardenhose?) about #Taverna, for 
example.

But I am happy to go through the text together again -- I will write privately to make arrangements.

--Paolo


On 2/22/12 4:05 PM, Stian Soiland-Reyes wrote:
> This issue is still open.
>
> The DM says:
>
>
>>   In particular, no assumptions are needed regarding the mutability of a data structure that is subject to updates. In fact, the state of a collection (i.e., the set of key-value pairs it contains) at a given point in a sequence of operations is never stated explicitly. Rather, it can be obtained by querying the chain of derivation assertions involving insertions and removals. Entity type prov:type="prov:EmptyCollection"%%xsd:QName can be used in this context as it marks the start of a sequence of collection operations.
>
> But this is contradictory. If a collection is mutable and key-values
> can leak in or out without being recorded, then it is not possible to
> obtain the state of the collection by following insertion/removal
> assertions.
>
>
> I argue that allowing these leaks means that CollectionInsert/Removal
> are almost useless assertions with regards to asserting the state of a
> collection. All you know by CollectionAfterInsertion(c3, c2, k1, v1)
> is that it contains (k1,v1). Any knowledge asserted on c2 is
> irrelevant with regards to c3. as c2's whole content could have
> changed out of bounds.
>
> So therefore we should not allow out of bands insertions/removal in
> collections that are related using
> CollectionAfterInsertion/CollectionAfterRemoval. Collections are
> mutable, but prov:Collections are not. They are frozen in time.
>
> What we can say is that the complete state of a collection might not
> be known unless it is traceable back to an EmptyCollection using
> CollectionAfterInsertion and CollectionAfterRemoval - and that we
> don't allow CollectionAfterInsertion(c3, c2, k1, v1) to overwrite k1,
> and therefore c2 did not contain the key k1.
>
>
>
> This means that we can't use prov:Collections to cover use cases where
> there really are leaking things in and out which are not observed.
>
> For instance:
> Watching people entering a McDonalds at 9:00 pm and 10:00 pm
>
> agent(stian)
> CollectionAfterInsertion(peopleInMcDonaldsAt9pm,
> peopleInMcDonaldsAt8pm, stian, stian)
> // Stian exits through the back door, or while the observer is not watching
> CollectionAfterInsertion(peopleInMcDonaldsAt10pm,
> peopleInMcDonaldsAt9pm, stian, stian)
>
> I would argue that such observations should not be asserted as
> collections, because peopleInMcDonaldsAt10pm here is a meaningless
> collection, there could also be lots of other people coming and going
> between 9:00 and 10:00 pm - and so you know nothing else about its
> state, and the beforeCollection argument is not related to the
> afterCollection by anything but a mere wasDerivedFrom.
>
>
Received on Wednesday, 7 March 2012 09:53:50 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 26 April 2012 13:06:58 GMT