W3C home > Mailing lists > Public > public-prov-wg@w3.org > February 2012

Re: Collections in PROV-O

From: Stian Soiland-Reyes <soiland-reyes@cs.manchester.ac.uk>
Date: Fri, 24 Feb 2012 09:56:07 +0000
Message-ID: <CAPRnXtm-MXsrfHXdiK=ciexhwJ3NJCOjQc-8wxepEq39cEqweA@mail.gmail.com>
To: Paolo Missier <Paolo.Missier@ncl.ac.uk>
Cc: Jun Zhao <jun.zhao@zoo.ox.ac.uk>, "public-prov-wg@w3.org" <public-prov-wg@w3.org>
On Thu, Feb 23, 2012 at 15:59, Paolo Missier <Paolo.Missier@ncl.ac.uk> wrote:
> But Stian's rendering is not valid exactly because he is ab-using the
> insertion mechanism by omitting the key. To me this is "scruffy". I could do

Yes, this is very scruffy. It is like saying X was generated by an
activity which used Y, but without identifying the activity. That's a
'power' of RDF if you like, which would not be easily be mapped to
PROV-DM (unless we introduce an anonymous _ prefix, like in Turtle)


> Either we introduce an extra relation for containment (and think through how
> it plays out with the current state update framework), or we use the current
> framework "properly".  The framework states that the only way you can
> /achieve/ containment of A into B is by asserting that A was added to B.
> Containment is a consequence of this.

But Jun might not know when A was added to B, she just observed that B
(that perhaps came out of a workflow activity) contained A - and
perhaps later she used A - and so we want to have a trace to say where
A came from. You can just say that A was derived from B (through a
get-from-collection kind of activity) - but that's not very
transparent.

I can see two problems with allowing asserting of collection members:

a) Users could say "impossible" things such as:
  A contained { (a,1), (b,2) }
  B was made by adding (c,3) to A
  B contained { (b,2), (d,4) }.

b) In an open world assumption it might be hard to assert that the
members of A was x,y,z and nothing else. Other times one might
actually want to say it contained x,y,z AND possibly something else
(unknown). See RDF collections and the problem of 'closing' a list.
[2]


We should make collection member containment an issue/request.

Jun - could you write it as an issue in the tracker and give a simple
usecase for when one would want to state containment?



> BTW 1: I don't see why having literal keys is a problem. I may be missing
> it. Nesting is accommodated by having values which are Entities (of type
> prov:Collection).

How would you represent this set of set of empty sets (ugh!) using literal keys?

A = {
  { {}, {} }
  { {} }
  { }
}

You have to make up some keys. (I did the inner set members be empty
sets instead of literals {a,b,c} so you can't be tempted to say "We'll
call the first key "a_b_c" etc :) )

Two asserters of provenance about modifications to this set could
generate totally different keys.

If you use entities as keys, it's easy, as you just use the value as
the key. For bags (unordered, allows duplicates) you still have a
problem, and would probably have to resort to fresh random-id-assigned
entity for every key. RDF containers [1] uses rdf:_n for this purpose,
for instance
  :bag rdf:_5 :fifth .
(even if the bag is unordered)


> BTW 2: when did prov:Container start to be used for collections?

Sorry, I obviously meant prov:Collection :)

[1] http://www.w3.org/TR/rdf-primer/#containers
[2] http://www.w3.org/TR/rdf-primer/#collections


-- 
Stian Soiland-Reyes, myGrid team
School of Computer Science
The University of Manchester
Received on Friday, 24 February 2012 09:56:55 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 26 April 2012 13:06:56 GMT