Re: PROV-ISSUE-137 (collection-isolation): Collection assertions does not guarantee isolation [Data Model] from Stian Soiland-Reyes on 2012-04-25 (public-prov-wg@w3.org from April 2012)

From: Stian Soiland-Reyes <soiland-reyes@cs.manchester.ac.uk>
Date: Wed, 25 Apr 2012 10:07:00 +0100
To: Paolo Missier <Paolo.Missier@ncl.ac.uk>
Cc: Provenance Working Group WG <public-prov-wg@w3.org>
Message-ID: <CAPRnXtmtR+2f5ygUF+DbNHRBRyOte1OaoJPABadDn83N4hSK3g@mail.gmail.com>
I have closed this issue, as
https://dvcs.w3.org/hg/prov/raw-file/default/model/prov-constraints.html#collection-constraints
now says that "PROV-DM does not provide an interpretation for
descriptions that consist of two (or more) insertion, removal,
membership relations that result in the same collection.".


On Wed, Mar 7, 2012 at 09:53, Paolo Missier <Paolo.Missier@ncl.ac.uk> wrote:
> Stian
>
> I missed this (and probably other) posts in the deluge of traffic from the
> list. It's frankly becoming unwieldy.
>
> You can't disallow out of band manipulation of a data structure. All you can
> do is interpret every statement about the current state of a structure
> relative to the closed world defined by the set of available assertions.
> Indeed, this is true of every other assertion. For instance, you can analyze
> a graph and conclude that "entity e has been used twice", but you know it's
> been used /at least/ twice.
> For anything that is not observable, or gaps in the sequence, the text
> points to wasDerivedFrom, as you note below, which of course is weaker.
>  So I propose to just make this CWA clear in the text, but I don' think you
> can do much more than this.
>  That said, I will argue that the current framework is still sufficient in
> settings where data structure manipulation is entirely observable. This is
> true for the entire class of provenance about most workflow and program
> executions, for example (including good old Taverna) which is one of the
> main reasons for adding collection provenance in the first place, and a
> long-time obsession of mine. Or a monotonically growing collection of tweets
> that you see coming through a firehose (or a gardenhose?) about #Taverna,
> for example.
>
> But I am happy to go through the text together again -- I will write
> privately to make arrangements.
>
> --Paolo
>
>
>
> On 2/22/12 4:05 PM, Stian Soiland-Reyes wrote:
>>
>> This issue is still open.
>>
>> The DM says:
>>
>>
>>>  In particular, no assumptions are needed regarding the mutability of a
>>> data structure that is subject to updates. In fact, the state of a
>>> collection (i.e., the set of key-value pairs it contains) at a given point
>>> in a sequence of operations is never stated explicitly. Rather, it can be
>>> obtained by querying the chain of derivation assertions involving insertions
>>> and removals. Entity type prov:type="prov:EmptyCollection"%%xsd:QName can be
>>> used in this context as it marks the start of a sequence of collection
>>> operations.
>>
>>
>> But this is contradictory. If a collection is mutable and key-values
>> can leak in or out without being recorded, then it is not possible to
>> obtain the state of the collection by following insertion/removal
>> assertions.
>>
>>
>> I argue that allowing these leaks means that CollectionInsert/Removal
>> are almost useless assertions with regards to asserting the state of a
>> collection. All you know by CollectionAfterInsertion(c3, c2, k1, v1)
>> is that it contains (k1,v1). Any knowledge asserted on c2 is
>> irrelevant with regards to c3. as c2's whole content could have
>> changed out of bounds.
>>
>> So therefore we should not allow out of bands insertions/removal in
>> collections that are related using
>> CollectionAfterInsertion/CollectionAfterRemoval. Collections are
>> mutable, but prov:Collections are not. They are frozen in time.
>>
>> What we can say is that the complete state of a collection might not
>> be known unless it is traceable back to an EmptyCollection using
>> CollectionAfterInsertion and CollectionAfterRemoval - and that we
>> don't allow CollectionAfterInsertion(c3, c2, k1, v1) to overwrite k1,
>> and therefore c2 did not contain the key k1.
>>
>>
>>
>> This means that we can't use prov:Collections to cover use cases where
>> there really are leaking things in and out which are not observed.
>>
>> For instance:
>> Watching people entering a McDonalds at 9:00 pm and 10:00 pm
>>
>> agent(stian)
>> CollectionAfterInsertion(peopleInMcDonaldsAt9pm,
>> peopleInMcDonaldsAt8pm, stian, stian)
>> // Stian exits through the back door, or while the observer is not
>> watching
>> CollectionAfterInsertion(peopleInMcDonaldsAt10pm,
>> peopleInMcDonaldsAt9pm, stian, stian)
>>
>> I would argue that such observations should not be asserted as
>> collections, because peopleInMcDonaldsAt10pm here is a meaningless
>> collection, there could also be lots of other people coming and going
>> between 9:00 and 10:00 pm - and so you know nothing else about its
>> state, and the beforeCollection argument is not related to the
>> afterCollection by anything but a mere wasDerivedFrom.
>>
>>
>



-- 
Stian Soiland-Reyes, myGrid team
School of Computer Science
The University of Manchester
Received on Wednesday, 25 April 2012 09:07:53 UTC