W3C home > Mailing lists > Public > public-prov-wg@w3.org > April 2012

Re: actions related to collections

From: Satya Sahoo <satya.sahoo@case.edu>
Date: Thu, 19 Apr 2012 11:35:45 -0400
Message-ID: <CAOMwk6zB_+BDhVm_ASJw8v2cX_Hh9hNTvNEuR1EssgKqq87n2w@mail.gmail.com>
To: Paolo Missier <Paolo.Missier@ncl.ac.uk>
Cc: "public-prov-wg@w3.org" <public-prov-wg@w3.org>
Hi Paolo,
Similar to Jun, my use case also requires 2 and not just 1.

On Thu, Apr 19, 2012 at 3:55 AM, Paolo Missier <Paolo.Missier@ncl.ac.uk>wrote:

>  Good morning. Catching up by picking on latest mail for continuity.
> My thoughts:
>
> - I like Tim's proposal to rename the current form of Collections as
> Dictionaries, as it's what they are -- and the current DM text does
> acknowledge that.
>
> So let's consider:
>
> 1- static sets (or multisets), like those in Satya & Jun's examples. These
> are necessary but I will argue, not sufficient. They can answer the
> question: "what does set s contain?", and "is x member of s?"
>
> 2- sets (or multisets) that are subject to updates. In general the
> question you want to support is: "how did this set reach its current
> state?"
>    Specific examples given in the past include: programs that manipulate
> lists (or even simpler sets), or more mundane ones: monitoring people going
> in and out of a building, counting people who board a plane, etc.
>
> 3- dictionaries, which are sets with more interesting properties (you can
> use the keys for indexing, you can simulate ordered lists by encoding the
> position in the key, etc.)
>    Incidentally, I find that user-defined keys are more generally than
> conventionally-imposed key names as in RDF (rdf:_1 etc.)
>    you can now track dependencies of the form: "entity with key k1 in
> dictionary d1 was derived from entity with key k2 in dictionary d2", where
> keys guarantee uniqueness.
>
> (All of the above can be nested structures simply by assuming, as we have
> done, that elements can be sets themselves)
>
> So what I observe is that 3 subsumes 2 subsumes 1.
>
>

I did not understand this - as I see it, 1 subsumes 2 subsumes 3 (since 3
is the most specialized version of 1).




> One possibility is to have a Set type for 1 and 2 (I see no point having a
> specific type for 1), and Dictionary for 3. This is done using prov:type.
>
> But then again, why not just have Dictionary. It minimizes the number of
> definitions. If all I need is a set (2), I can just have pairs (e,e) as
> members --no need to invent keys. If I only need (1), I don't use
> insert/removal.
>
> I would say we should have to more generic version and allow users to
define their own specialized constructs.

Thanks.

Best,
Satya


> Additional thoughts?
>
> -Paolo
>
>
>
>
> On 4/19/12 6:31 AM, Luc Moreau wrote:
>
> Hi Tim,
>
>  Your position in favour of prov:dictionary is really clear.
>
>  Two questions:
>
>  1. Is prov:dictionary an essentially feature of prov-dm and should stay
> in the prov-dm document?
>
>  2.. What about Jun/Satya's request for a simple membership property?
> Should it be added to prov-dm?
>
> Professor Luc Moreau
> Electronics and Computer Science
> University of Southampton
> Southampton SO17 1BJ
> United Kingdom
>
> On 18 Apr 2012, at 23:08, "Timothy Lebo" <lebot@rpi.edu> wrote:
>
>  Luc,
>
>  On Apr 18, 2012, at 4:19 PM, Luc Moreau wrote:
>
>  Dear all,
>
> I just wanted to throw a few ideas/questions to defend collections as they
> currently are.
>
> 1. prov:Collection is similar to rdfs:Container [1] :
> the properties rdf:_1, rdf:_2, ...[2]  map naturally to keys in
> prov:Collection.
>
>
>  I don't see how these map.
> In prov:Collection, keys have values chosen by the user -- rdfs:Container
> imposes the rdf:_N "value" for the "key".
> rdfs:Container doesn't support keys.
>
>  I think there is consensus that prov:Collection as it stands is _more_
> than set membership.
> I argue that this more expressive construct is incredibly useful but
> misleadingly named.
>
>
> 2. RDF collections [3] can also be described by prov:Collection, using
> rdf:first and rdf:rest
>     as keys for a collection of two elements, and allowing nesting of
> collections.
>
>
>  Although it's true that one can reproduce an rdf:List using the current
> definition of prov:Collection,
> I'm not sure this provides "nesting" in any useful form.
> It also shows how prov:Collection is a more general construct than
> rdf:List.
>
>
>
> So a few questions:
>
> 1. Is it being suggested that rdfs:Container and rdf:List are not
> appropriate, and we
>     should look at other forms of "collections"?
>
>
>
>  I'm suggesting we rename "collection" to "dictionary". The confusion is
> occurring when people read prov:Collection definitions as if it is set
> membership, which it is not optimized for.
> The capabilities that it _is_ optimized for are very useful, should stay,
> will be used heavily, but should be renamed to something less misleading.
>
>
>
> 2. Has the prov-o ontology encoded prov-dm collections in a way that is
> lightweight enough?
>     Could we for instance restrict the keys to be mapped to  properties
> such as rdf:_1, rdf:_2?
>
>
>  I'm not sure why we want to contort the eloquence of the Dictionary into
> something that is less expressive (rdfs:Container), and which has been
> disregarded for practical uses during the decade that it has been available.
>
>
>
>
>
>  I however acknowledge that prov:Collection is not "natural" to model a
> set.
>
>
>  prov:Dictionary!
>
>
>  I suppose that
> like  "rdf:Bag class is used conventionally to indicate to a human reader
> that the container is intended to be unordered",
> we would need a similar notion for expressing sets with prov:Collection.
>
>
>  We should leave modeling sets to SIOC and RDFS and focus on giving the
> community something that it doesn't have -- a construct that lets us encode
> the provenance of function calls with multiple inputs and multiple outputs.
>
>  We don't have a set membership construct and we shouldn't encourage
> people to misuse a dictionary to model a set.
>
>
>  -Tim
>
>
>
> Cheers,
> Luc
>
> [1] http://www.w3.org/TR/rdf-schema/#ch_container
> [2] http://www.w3.org/TR/rdf-schema/#ch_containermembershipproperty
> [3] http://www.w3.org/TR/rdf-schema/#ch_collectionvocab
>
>
> On 18/04/12 19:39, Stephan Zednik wrote:
>
>
>  On Apr 18, 2012, at 12:24 PM, Timothy Lebo wrote:
>
>  I've had similar concerns that the definitions for collections are "too
> heavyweight" to manage the membership of sets.
>
> But while ignoring is name and looking at the modeling construct it
> provides, it's clear that this construct will be very useful in many real
> provenance problems (for example, the very ubiquitous need for provenance
> of function calls with their argument names and bindings).
>
>  Perhaps we can avoid the "too heavyweight for set membership" concerns
> raised by Satya and Jun by renaming what we have (prov:Collection) to
> something more appropriate, like prov:Dictionary?
>
>
>  +1
>
>  Jim is right that you can model collections with enumerated classes, but
> I am not sure about stating the provenance of a collection defined by an
> enumerated class.
>
>  We could also define a much simpler prov:Collection class that does not
> force map/dictionary conventions to go along with prov:Dictionary.
>
>  --Stephan
>
>
>  -Tim
>
>  On Apr 18, 2012, at 2:12 PM, Jim McCusker wrote:
>
> I think a set of key-value pairs is what's known as a map or dictionary. A
> collection is a set of things with a defined membership. In OWL it would
> probably be represented as an enumerated class.
>
>  Jim
>
> On Wed, Apr 18, 2012 at 1:20 PM, Jun Zhao <jun.zhao@zoo.ox.ac.uk> wrote:
>
>>
>> Dear all,
>>
>> I concur with what Satya wrote. And the example I had in mind is
>> collection type of entities on the blog sphere of the Web.
>>
>> As we all know SIOC is a widely used vocabulary to describe entities in
>> the online community sites, like blogs, wikis, etc. It has the concept of
>> sioc:Container, which is defined as "a high-level concept used to group
>> content Items together". The relationships between a sioc:Container and the
>> sioc:Items or sioc:Posts that belong to it are described using
>> sioc:container_of and sioc:has_container properties.
>>
>> The provenance of a sioc:Container could be who is/are responsible for
>> the container, who created this container, and when.
>>
>> The provenance of a sioc:Post could include when the posted was
>> published, when it was modified, by whom, based on which other posts,
>> document or data.
>>
>> As you see, I am struggling to see how the key-value pair kind of
>> structure could play in the above simple scenario. But please correct me if
>> I am wrong.
>>
>> HTH,
>>
>> Jun
>>
>>
>>
>>
>> On 18/04/2012 18:35, Satya Sahoo wrote:
>>
>>>  Hi all,
>>> The issue I had raised last week is that collection is an important
>>> provenance construct, but the assumption of only key-value pair based
>>> collection is too narrow and the relations derivedByInsertionFrom,
>>> Derivation-by-Removal are over specifications that are not required.
>>>
>>> I have collected the following examples for collection, which only
>>> require
>>> the definition of the collection in DM5 (collection of entities) and they
>>> don't have (a) a key-value structure, and (b) derivedByInsertionFrom,
>>> derivedByRemovalFrom relations are not needed:
>>> 1. Cell line is a collection of cells used in many biomedical
>>> experiments.
>>> The provenance of the cell line (as a collection) include, who submitted
>>> the cell line, what method was used to authenticate the cell line, when
>>> was
>>> the given cell line contaminated? The provenance of the cells in a cell
>>> line include, what is the source of the cells (e.g. organism)?
>>>
>>> 2. A patient cohort is a collection of patients satisfying some
>>> constraints
>>> for a research study. The provenance of the cohort include, what
>>> eligibility criteria were used to identify the cohort, when was the
>>> cohort
>>> identified? The provenance of the patients in a cohort may include their
>>> health provider etc.
>>>
>>> Hope this helps our discussion.
>>>
>>> Thanks.
>>>
>>> Best,
>>> Satya
>>>
>>>
>>> On Thu, Apr 12, 2012 at 5:06 PM, Luc Moreau<L.Moreau@ecs.soton.ac.uk
>>> >wrote:
>>>
>>>
>>>> Hi Jun and Satya,
>>>>
>>>> Following today's call, ACTION-76 [1] and ACTION-77 [2] were raised
>>>> against you, as we agreed.
>>>>
>>>> Cheers,
>>>> Luc
>>>>
>>>>  [1] https://www.w3.org/2011/prov/**track/actions/76<
>>>> https://www.w3.org/2011/prov/track/actions/76>
>>>> [2] https://www.w3.org/2011/prov/**track/actions/77<
>>>> https://www.w3.org/2011/prov/track/actions/77>
>>>>
>>>>
>>>>
>>>
>>
>>
>
>
>  --
> Jim McCusker
> Programmer Analyst
> Krauthammer Lab, Pathology Informatics
> Yale School of Medicine
> james.mccusker@yale.edu | (203) 785-6330
> http://krauthammerlab.med.yale.edu
>
> PhD Student
> Tetherless World Constellation
> Rensselaer Polytechnic Institute
> mccusj@cs.rpi.edu
> http://tw.rpi.edu
>
>
>
>
>
>
> --
> -----------  ~oo~  --------------
> Paolo Missier - Paolo.Missier@newcastle.ac.uk, pmissier@acm.org
> School of Computing Science, Newcastle University,  UKhttp://www.cs.ncl.ac.uk/people/Paolo.Missier
>
>
Received on Thursday, 19 April 2012 15:36:27 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 26 April 2012 13:07:03 GMT