- From: Timothy Lebo <lebot@rpi.edu>
- Date: Thu, 19 Apr 2012 08:41:49 -0400
- To: Paolo Missier <Paolo.Missier@ncl.ac.uk>
- Cc: "public-prov-wg@w3.org" <public-prov-wg@w3.org>
- Message-Id: <F41275FF-4BD7-4056-A2D9-E98C18066A01@rpi.edu>
Paolo, On Apr 19, 2012, at 3:55 AM, Paolo Missier wrote: > Good morning. Catching up by picking on latest mail for continuity. > My thoughts: > > - I like Tim's proposal to rename the current form of Collections as Dictionaries, as it's what they are -- and the current DM text does acknowledge that. > > So let's consider: > > 1- static sets (or multisets), like those in Satya & Jun's examples. These are necessary but I will argue, not sufficient. They can answer the question: "what does set s contain?", and "is x member of s?" > > 2- sets (or multisets) that are subject to updates. In general the question you want to support is: "how did this set reach its current state?" > Specific examples given in the past include: programs that manipulate lists (or even simpler sets), or more mundane ones: monitoring people going in and out of a building, counting people who board a plane, etc. > > 3- dictionaries, which are sets with more interesting properties (you can use the keys for indexing, you can simulate ordered lists by encoding the position in the key, etc.) > Incidentally, I find that user-defined keys are more generally than conventionally-imposed key names as in RDF (rdf:_1 etc.) > you can now track dependencies of the form: "entity with key k1 in dictionary d1 was derived from entity with key k2 in dictionary d2", where keys guarantee uniqueness. > > (All of the above can be nested structures simply by assuming, as we have done, that elements can be sets themselves) > > So what I observe is that 3 subsumes 2 subsumes 1. > > One possibility is to have a Set type for 1 and 2 (I see no point having a specific type for 1), and Dictionary for 3. This is done using prov:type. > > But then again, why not just have Dictionary. It minimizes the number of definitions. If all I need is a set (2), I can just have pairs (e,e) as members Because it's a bit verbose for a simple case, and the transition from URI to a literal in PROV-O (and casting back and forth) will be a headache. Although dictionaries _can_ be used for 2 and 1, it's too much effort. I suggest we keep dictionaries to do dictionary things and stop trying to contort it into its simple cases. That leaves: A) We add support for Sets in a direct way B) We just don't' support Sets in a direct way. In either case, we can have prov:Collection (stripped of all of it's current meaning) as a superclass of prov:Dictionary (renamed from prov:Collections) and leave it to someone else to extend prov:Collection to make a simple, boring, their:Set. -Tim > --no need to invent keys. If I only need (1), I don't use insert/removal. > > Additional thoughts? > > -Paolo > > > > On 4/19/12 6:31 AM, Luc Moreau wrote: >> >> Hi Tim, >> >> Your position in favour of prov:dictionary is really clear. >> >> Two questions: >> >> 1. Is prov:dictionary an essentially feature of prov-dm and should stay in the prov-dm document? >> >> 2.. What about Jun/Satya's request for a simple membership property? Should it be added to prov-dm? >> >> Professor Luc Moreau >> Electronics and Computer Science >> University of Southampton >> Southampton SO17 1BJ >> United Kingdom >> >> On 18 Apr 2012, at 23:08, "Timothy Lebo" <lebot@rpi.edu> wrote: >> >>> Luc, >>> >>> On Apr 18, 2012, at 4:19 PM, Luc Moreau wrote: >>> >>>> Dear all, >>>> >>>> I just wanted to throw a few ideas/questions to defend collections as they currently are. >>>> >>>> 1. prov:Collection is similar to rdfs:Container [1] : >>>> the properties rdf:_1, rdf:_2, ...[2] map naturally to keys in prov:Collection. >>> >>> I don't see how these map. >>> In prov:Collection, keys have values chosen by the user -- rdfs:Container imposes the rdf:_N "value" for the "key". >>> rdfs:Container doesn't support keys. >>> >>> I think there is consensus that prov:Collection as it stands is _more_ than set membership. >>> I argue that this more expressive construct is incredibly useful but misleadingly named. >>> >>>> >>>> 2. RDF collections [3] can also be described by prov:Collection, using rdf:first and rdf:rest >>>> as keys for a collection of two elements, and allowing nesting of collections. >>> >>> Although it's true that one can reproduce an rdf:List using the current definition of prov:Collection, >>> I'm not sure this provides "nesting" in any useful form. >>> It also shows how prov:Collection is a more general construct than rdf:List. >>> >>> >>>> >>>> So a few questions: >>>> >>>> 1. Is it being suggested that rdfs:Container and rdf:List are not appropriate, and we >>>> should look at other forms of "collections"? >>> >>> >>> I'm suggesting we rename "collection" to "dictionary". The confusion is occurring when people read prov:Collection definitions as if it is set membership, which it is not optimized for. >>> The capabilities that it _is_ optimized for are very useful, should stay, will be used heavily, but should be renamed to something less misleading. >>> >>> >>>> >>>> 2. Has the prov-o ontology encoded prov-dm collections in a way that is lightweight enough? >>>> Could we for instance restrict the keys to be mapped to properties such as rdf:_1, rdf:_2? >>> >>> I'm not sure why we want to contort the eloquence of the Dictionary into something that is less expressive (rdfs:Container), and which has been disregarded for practical uses during the decade that it has been available. >>> >>> >>>> >>>> >>> >>>> I however acknowledge that prov:Collection is not "natural" to model a set. >>> >>> prov:Dictionary! >>> >>> >>>> I suppose that >>>> like "rdf:Bag class is used conventionally to indicate to a human reader that the container is intended to be unordered", >>>> we would need a similar notion for expressing sets with prov:Collection. >>> >>> We should leave modeling sets to SIOC and RDFS and focus on giving the community something that it doesn't have -- a construct that lets us encode the provenance of function calls with multiple inputs and multiple outputs. >>> >>> We don't have a set membership construct and we shouldn't encourage people to misuse a dictionary to model a set. >>> >>> >>> -Tim >>> >>> >>>> >>>> Cheers, >>>> Luc >>>> >>>> [1] http://www.w3.org/TR/rdf-schema/#ch_container >>>> [2] http://www.w3.org/TR/rdf-schema/#ch_containermembershipproperty >>>> [3] http://www.w3.org/TR/rdf-schema/#ch_collectionvocab >>>> >>>> >>>> On 18/04/12 19:39, Stephan Zednik wrote: >>>>> >>>>> >>>>> On Apr 18, 2012, at 12:24 PM, Timothy Lebo wrote: >>>>> >>>>>> I've had similar concerns that the definitions for collections are "too heavyweight" to manage the membership of sets. >>>>>> >>>>>> But while ignoring is name and looking at the modeling construct it provides, it's clear that this construct will be very useful in many real provenance problems (for example, the very ubiquitous need for provenance of function calls with their argument names and bindings). >>>>>> >>>>>> Perhaps we can avoid the "too heavyweight for set membership" concerns raised by Satya and Jun by renaming what we have (prov:Collection) to something more appropriate, like prov:Dictionary? >>>>> >>>>> +1 >>>>> >>>>> Jim is right that you can model collections with enumerated classes, but I am not sure about stating the provenance of a collection defined by an enumerated class. >>>>> >>>>> We could also define a much simpler prov:Collection class that does not force map/dictionary conventions to go along with prov:Dictionary. >>>>> >>>>> --Stephan >>>>> >>>>>> >>>>>> -Tim >>>>>> >>>>>> On Apr 18, 2012, at 2:12 PM, Jim McCusker wrote: >>>>>> >>>>>>> I think a set of key-value pairs is what's known as a map or dictionary. A collection is a set of things with a defined membership. In OWL it would probably be represented as an enumerated class. >>>>>>> >>>>>>> Jim >>>>>>> >>>>>>> On Wed, Apr 18, 2012 at 1:20 PM, Jun Zhao <jun.zhao@zoo.ox.ac.uk> wrote: >>>>>>> >>>>>>> Dear all, >>>>>>> >>>>>>> I concur with what Satya wrote. And the example I had in mind is collection type of entities on the blog sphere of the Web. >>>>>>> >>>>>>> As we all know SIOC is a widely used vocabulary to describe entities in the online community sites, like blogs, wikis, etc. It has the concept of sioc:Container, which is defined as "a high-level concept used to group content Items together". The relationships between a sioc:Container and the sioc:Items or sioc:Posts that belong to it are described using sioc:container_of and sioc:has_container properties. >>>>>>> >>>>>>> The provenance of a sioc:Container could be who is/are responsible for the container, who created this container, and when. >>>>>>> >>>>>>> The provenance of a sioc:Post could include when the posted was published, when it was modified, by whom, based on which other posts, document or data. >>>>>>> >>>>>>> As you see, I am struggling to see how the key-value pair kind of structure could play in the above simple scenario. But please correct me if I am wrong. >>>>>>> >>>>>>> HTH, >>>>>>> >>>>>>> Jun >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 18/04/2012 18:35, Satya Sahoo wrote: >>>>>>> Hi all, >>>>>>> The issue I had raised last week is that collection is an important >>>>>>> provenance construct, but the assumption of only key-value pair based >>>>>>> collection is too narrow and the relations derivedByInsertionFrom, >>>>>>> Derivation-by-Removal are over specifications that are not required. >>>>>>> >>>>>>> I have collected the following examples for collection, which only require >>>>>>> the definition of the collection in DM5 (collection of entities) and they >>>>>>> don't have (a) a key-value structure, and (b) derivedByInsertionFrom, >>>>>>> derivedByRemovalFrom relations are not needed: >>>>>>> 1. Cell line is a collection of cells used in many biomedical experiments. >>>>>>> The provenance of the cell line (as a collection) include, who submitted >>>>>>> the cell line, what method was used to authenticate the cell line, when was >>>>>>> the given cell line contaminated? The provenance of the cells in a cell >>>>>>> line include, what is the source of the cells (e.g. organism)? >>>>>>> >>>>>>> 2. A patient cohort is a collection of patients satisfying some constraints >>>>>>> for a research study. The provenance of the cohort include, what >>>>>>> eligibility criteria were used to identify the cohort, when was the cohort >>>>>>> identified? The provenance of the patients in a cohort may include their >>>>>>> health provider etc. >>>>>>> >>>>>>> Hope this helps our discussion. >>>>>>> >>>>>>> Thanks. >>>>>>> >>>>>>> Best, >>>>>>> Satya >>>>>>> >>>>>>> >>>>>>> On Thu, Apr 12, 2012 at 5:06 PM, Luc Moreau<L.Moreau@ecs.soton.ac.uk>wrote: >>>>>>> >>>>>>> >>>>>>> Hi Jun and Satya, >>>>>>> >>>>>>> Following today's call, ACTION-76 [1] and ACTION-77 [2] were raised >>>>>>> against you, as we agreed. >>>>>>> >>>>>>> Cheers, >>>>>>> Luc >>>>>>> >>>>>>> [1] https://www.w3.org/2011/prov/**track/actions/76<https://www.w3.org/2011/prov/track/actions/76> >>>>>>> [2] https://www.w3.org/2011/prov/**track/actions/77<https://www.w3.org/2011/prov/track/actions/77> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Jim McCusker >>>>>>> Programmer Analyst >>>>>>> Krauthammer Lab, Pathology Informatics >>>>>>> Yale School of Medicine >>>>>>> james.mccusker@yale.edu | (203) 785-6330 >>>>>>> http://krauthammerlab.med.yale.edu >>>>>>> >>>>>>> PhD Student >>>>>>> Tetherless World Constellation >>>>>>> Rensselaer Polytechnic Institute >>>>>>> mccusj@cs.rpi.edu >>>>>>> http://tw.rpi.edu >>>>>> >>>>> >>> > > > -- > ----------- ~oo~ -------------- > Paolo Missier - Paolo.Missier@newcastle.ac.uk, pmissier@acm.org > School of Computing Science, Newcastle University, UK > http://www.cs.ncl.ac.uk/people/Paolo.Missier
Received on Thursday, 19 April 2012 12:59:47 UTC