Re: PROV-ISSUE-137 (collection-isolation): Collection assertions does not guarantee isolation [Data Model] from Stian Soiland-Reyes on 2012-02-22 (public-prov-wg@w3.org from February 2012)

From: Stian Soiland-Reyes <soiland-reyes@cs.manchester.ac.uk>
Date: Wed, 22 Feb 2012 16:05:43 +0000
To: Provenance Working Group WG <public-prov-wg@w3.org>
Message-ID: <CAPRnXtn1XnkfKyWT2WDeUgXSLEx5saaX180gsG=x3Eo4aomY0w@mail.gmail.com>
This issue is still open.

The DM says:


>  In particular, no assumptions are needed regarding the mutability of a data structure that is subject to updates. In fact, the state of a collection (i.e., the set of key-value pairs it contains) at a given point in a sequence of operations is never stated explicitly. Rather, it can be obtained by querying the chain of derivation assertions involving insertions and removals. Entity type prov:type="prov:EmptyCollection"%%xsd:QName can be used in this context as it marks the start of a sequence of collection operations.


But this is contradictory. If a collection is mutable and key-values
can leak in or out without being recorded, then it is not possible to
obtain the state of the collection by following insertion/removal
assertions.


I argue that allowing these leaks means that CollectionInsert/Removal
are almost useless assertions with regards to asserting the state of a
collection. All you know by CollectionAfterInsertion(c3, c2, k1, v1)
is that it contains (k1,v1). Any knowledge asserted on c2 is
irrelevant with regards to c3. as c2's whole content could have
changed out of bounds.

So therefore we should not allow out of bands insertions/removal in
collections that are related using
CollectionAfterInsertion/CollectionAfterRemoval. Collections are
mutable, but prov:Collections are not. They are frozen in time.

What we can say is that the complete state of a collection might not
be known unless it is traceable back to an EmptyCollection using
CollectionAfterInsertion and CollectionAfterRemoval - and that we
don't allow CollectionAfterInsertion(c3, c2, k1, v1) to overwrite k1,
and therefore c2 did not contain the key k1.



This means that we can't use prov:Collections to cover use cases where
there really are leaking things in and out which are not observed.

For instance:
Watching people entering a McDonalds at 9:00 pm and 10:00 pm

agent(stian)
CollectionAfterInsertion(peopleInMcDonaldsAt9pm,
peopleInMcDonaldsAt8pm, stian, stian)
// Stian exits through the back door, or while the observer is not watching
CollectionAfterInsertion(peopleInMcDonaldsAt10pm,
peopleInMcDonaldsAt9pm, stian, stian)

I would argue that such observations should not be asserted as
collections, because peopleInMcDonaldsAt10pm here is a meaningless
collection, there could also be lots of other people coming and going
between 9:00 and 10:00 pm - and so you know nothing else about its
state, and the beforeCollection argument is not related to the
afterCollection by anything but a mere wasDerivedFrom.





On Sun, Oct 30, 2011 at 00:29, Provenance Working Group Issue Tracker
<sysbot+tracker@w3.org> wrote:
>
> PROV-ISSUE-137 (collection-isolation): Collection assertions does not guarantee isolation [Data Model]
>
> http://www.w3.org/2011/prov/track/issues/137
>
> Raised by: Stian Soiland-Reyes
> On product: Data Model
>
> http://www.w3.org/TR/prov-dm/#expression-Collection introduces relations for expressing collection modifications, including:
>
>> Expression: wasAddedTo_Coll(c2,c1) (resp. wasRemovedFrom_Coll(c2,c1)) denotes that collection c2 is an updated version of collection c1, following an insertion (resp. deletion) operation.
>
> a) Can other entities/keys be added or removed from c1 during its lifetime (on their own without wasAdded/Removed assertions), or is its whole content fixed for the duration of the entity c1?
>
> b) in wasAddedTo_Coll(c2,c1)  will c2 contain every key/value of c1, in addition to the added key/entity? (ignoring for now the separate issue collection-collision) - or could some c1 keys/elements be missing - or other keys/elements also have been added to c2?
>
> c) (equivalent for wasRemovedFrom_Coll(c3, c2)
>
> d) Is it possible to have both wasAddedTo_Coll(c2, c1) and wasRemovedFrom_Coll(c2,c1) at the same time, or are these functionally  (issue collection-functional) exclusive? If it is possible - in which order should they be interpreted if they state the same key/value?
>
>
> In short - I believe the collection assertions are useful - but they should also come with a strong promise that no other elements were added/removed between c2 and c1 - otherwise for all you know all of c1 has been removed "offline" from c2 (a hole in the bucket, so to speak), and you can then never look further back than the last added key/element.
>
>
>
>
>



-- 
Stian Soiland-Reyes, myGrid team
School of Computer Science
The University of Manchester
Received on Wednesday, 22 February 2012 16:06:36 UTC