Re: PROV-ISSUE-474 (instances-and-bundles): Bundles and valid instances [prov-dm-constraints] from Ivan Herman on 2012-08-10 (public-prov-wg@w3.org from August 2012)

From: Ivan Herman <ivan@w3.org>
Date: Fri, 10 Aug 2012 14:04:14 +0200
To: James Cheney <jcheney@inf.ed.ac.uk>
Cc: Simon Miles <simon.miles@kcl.ac.uk>, Provenance Working Group <public-prov-wg@w3.org>
Message-Id: <D6F26D2C-A09B-4E0E-A426-42CD927F138A@w3.org>
On Aug 10, 2012, at 13:54 , James Cheney wrote:

[skip]

>> 
>>> I'd rather not invent three new terms for things that directly align with existing terminology, but I appreciate the concern that we not use the same terminology for slightly different things.  So any suggestions for a better term than "dataset" would be welcome.
>>> 
>>> 
>>>> B.t.w., if I use the RDF datasets as an analogy: that consists of (G, (n1,G1),....,(ni,Gi)), where (ni,Gi) is, to use the current terminology, a named graph (that is the term used in SPARQL) and G is the 'default graph'. As an analogy, what about 'default bundle' ?
>>>> 
>>> 
>>> Default bundle might be better than toplevel bundle, but for us, "bundle" has generally meant "named set of statements".
>>> 
>> 
>> I am not sure I understand the issue. We can call a bundle a 'set of statements' (yes, it is pretty much like an RDF graph being a set of triples...), and we have then named bundles and one default bundle.
>> 
> 
> I just meant to say that this issue seemed to arise because we were using "bundle" as if it meant "set of statements", whereas in many places it means "NAMED set of statements".  
> Specifically, the PROV-DM definition of bundle is:
> 
> A bundle is a named set of provenance descriptions, and is itself an entity, so allowing provenance of provenance to be expressed.
> 
> 
> although, in prov-n, we distinguish "toplevel bundle" from "named bundles".  It seems to me that using "bundle" on its own in PROV-CONSTRAINTS to refer to either a named or non-named set of statements would be super-confusing.  I think this is part of what Simon was concerned about too.
> 
> I was trying to avoid changes that affect other documents that are out as last call drafts, since I know changes to them are potentially problematic.
> 
> Would changing the above definition to something like "A bundle is a set of provenance descriptions, and is itself an entity that may have an identifier, so allowing provenance of provenance to be expressed."
> 
> work for you, and would it be possible to make such a change to PROV-DM after last call?  
> 

Yes, it would work and I would not have a problem with a last call change of the sort; as far as I can see this does not change the technical design of prov-dm. It is only a naming issue.

Ivan


> 
> --James
> 
>> Ivan
>> 
>> 
>>> So another terminology could be:
>>> 
>>> - "instance" - the whole thing (toplevel bundle + named bundles)
>>> - "bundle" - any set of statements (named or unnamed)
>>> - "named bundle" - 
>>> - "default bundle" - the unnamed set of statements at the toplevel (what we were calling "toplevel bundle" or "toplevel instance" in my recent revision).
>>> 
>>> Simon, would that be acceptable instead of "dataset", "instance", "bundle", "toplevel instance"?  Are there lots of places where we say that bundles are named, that would have to change to draw this distinction?
>>> 
>>> One advantage of this would be that prov-n doesn't need to change (except maybe renaming "toplevel" to "default").  
>>> 
>>> I think we can probably keep this change independent of technical content, so that we can align (or not) with RDF 1.1 later, in any case.
>>> 
>>> --James
>>> 
>>> 
>>> 
>>>> Ivan
>>>> 
>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Isn't it the case that an instance (which is a prov-constraint concept and not a prov-n concept)
>>>>>> a set of statement or a bundle or a toplevel-bundle/dataset?
>>>>> 
>>>>> I am now proposing that we use "instance" solely for "set of statements".  If this term is only used in this sense in prov-constraints, then it seems that we are free to redefine it, within reason.  Most of the document concerns instances, so the number of changes was small. For cohesion, if we talk about sets of statements elsewhere it might be sensible to call them "instances", but I don't insist on it, nor do I insist on the use of "dataset" elsewhere.
>>>>> 
>>>>> --James
>>>>> 
>>>>> 
>>>>>> 
>>>>>> Luc
>>>>>> 
>>>>>> 
>>>>>> On 09/08/12 18:03, James Cheney wrote:
>>>>>>> OK.  I have done a quick pass to use the term "PROV dataset" and changed all occurrences of "toplevel bundle" to "toplevel instance".  I think it's a lot better this way!
>>>>>>> 
>>>>>>> instance = named set of statements.  (Excluding "bundle" constructs, which are not statements.)
>>>>>>> bundle = named set of statements ~= named graph of PROV-O (hopefully!)
>>>>>>> dataset = an instance and zero or more bundles (with distinct names).
>>>>>>> toplevel instance = the set of statements at the toplevel of a dataset
>>>>>>> 
>>>>>>> Module typos/snags, does this look OK?  If so I will close.
>>>>>>> 
>>>>>>> Perhaps this terminology would be useful in other documents (Luc pointed out PROV-N uses "toplevel bundle" too...).
>>>>>>> 
>>>>>>> --James
>>>>>>> 
>>>>>>> On Aug 9, 2012, at 5:41 PM, Miles, Simon wrote:
>>>>>>> 
>>>>>>>> Hello James,
>>>>>>>> 
>>>>>>>> I strongly agree with the suggested general solution. I have no objection to "dataset" as a term. If you do still need to talk about bundles at all in PROV-Constraints, I think it should be made clear that the "toplevel" does not need to be named (does not need to be a bundle) to avoid confusion of concepts for different purposes.
>>>>>>>> 
>>>>>>>> As said on the IRC, I don't think this is a blocking issue, just a matter of text clarification.
>>>>>>>> 
>>>>>>>> thanks,
>>>>>>>> Simon
>>>>>>>> 
>>>>>>>> Dr Simon Miles
>>>>>>>> Senior Lecturer, Department of Informatics
>>>>>>>> Kings College London, WC2R 2LS, UK
>>>>>>>> +44 (0)20 7848 1166
>>>>>>>> 
>>>>>>>> Evolutionary Testing of Autonomous Software Agents:
>>>>>>>> http://eprints.dcs.kcl.ac.uk/1370/
>>>>>>>> ________________________________________
>>>>>>>> From: James Cheney [jcheney@inf.ed.ac.uk]
>>>>>>>> Sent: 09 August 2012 17:21
>>>>>>>> To: Provenance Working Group
>>>>>>>> Subject: Re: PROV-ISSUE-474 (instances-and-bundles): Bundles and valid instances [prov-dm-constraints]
>>>>>>>> 
>>>>>>>> We discussed this in the teleconference and it sounded like it would be appropriate to find better terminology for the following three things, which are currently not clearly distinguished:
>>>>>>>> 
>>>>>>>> - "the whole PROV instance, including set of toplevel statements and bundles"
>>>>>>>> - "a particular set of statements, either the toplevel one or one within a bundle"
>>>>>>>> - bundle = "a named set of provenance statements"
>>>>>>>> 
>>>>>>>> My initial proposal is "PROV dataset", "PROV instance", and "bundle".  I believe "PROV dataset" is roughly analogous to what people call "dataset" in the context of SPARQL; if anyone knows different (or has objections or better suggestions), let me know.
>>>>>>>> 
>>>>>>>> I'll send another message on this when this is ready for review.
>>>>>>>> 
>>>>>>>> --James
>>>>>>>> 
>>>>>>>> On Aug 9, 2012, at 3:45 PM, Provenance Working Group Issue Tracker wrote:
>>>>>>>> 
>>>>>>>>> PROV-ISSUE-474 (instances-and-bundles): Bundles and valid instances [prov-dm-constraints]
>>>>>>>>> 
>>>>>>>>> http://www.w3.org/2011/prov/track/issues/474
>>>>>>>>> 
>>>>>>>>> Raised by: Simon Miles
>>>>>>>>> On product: prov-dm-constraints
>>>>>>>>> 
>>>>>>>>> As requested, I'm submitting an issue where I feel a PROV-Constraints review comment of mine is not completely answered.
>>>>>>>>> 
>>>>>>>>> My original comment:
>>>>>>>>>> Bundles
>>>>>>>>>> -------
>>>>>>>>>> F. Section 6.1 seems a bit out of the blue. "The definitions
>>>>>>>>>> [etc.]... assume a PROV instance with exactly one bundle", and then
>>>>>>>>>> multiple bundles are handled as exactly the same number of
>>>>>>>>>> instances. Why? Why is there a connection between number of instances
>>>>>>>>>> and number of bundles? Why would a bundle be considered to be only one
>>>>>>>>>> instance? I thought a bundle was an identified set of statements,
>>>>>>>>>> allowing for provenance of provenance, which seems a distinct matter
>>>>>>>>>> from whether a set of statements are valid. It seems fine for a user
>>>>>>>>>> to treat one bundle as one instance if they want to, but there's no
>>>>>>>>>> reason given why this is the general case.
>>>>>>>>> Response from editors:
>>>>>>>>>> I am not sure I understand this comment.  However, I have rewritten
>>>>>>>>>> slightly the intro of section 6.1.
>>>>>>>>>> 
>>>>>>>>>> "The definitions, inferences, and constraints, and the resulting notions of normalization, validity and equivalence, assume a PROV instance that consists of exactly one bundle, the toplevel bundle, containing all PROV statements in the top level of the bundle (that is, not enclosed in a named bundle). In this section, we describe how to deal with PROV instances consisting of multiple named bundles. Briefly, each bundle is handled independently; there is no interaction between bundles from the perspective of applying definitions, inferences, or constraints, computing normal forms, or checking validity or equivalence."
>>>>>>>>> I agree this is clearer, but I don't feel it answers the key questions in my comment. To put my comment another way: you have explained checking validity where an instance consists of one bundle and of multiple bundles. The two other possibilities I see are:
>>>>>>>>> (a) A bundle containing multiple instances;
>>>>>>>>> (b) An instance that is a collection of PROV descriptions with no identifier and so is not a bundle, e.g. a provenance service query result.
>>>>>>>>> 
>>>>>>>>> How do we deal with each of these cases? Or, if they cannot occur, why not?
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> Simon
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> --
>>>>>>>> The University of Edinburgh is a charitable body, registered in
>>>>>>>> Scotland, with registration number SC005336.
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> -- 
>>>>>> Professor Luc Moreau
>>>>>> Electronics and Computer Science   tel:   +44 23 8059 4487
>>>>>> University of Southampton          fax:   +44 23 8059 2865
>>>>>> Southampton SO17 1BJ               email: l.moreau@ecs.soton.ac.uk
>>>>>> United Kingdom                     http://www.ecs.soton.ac.uk/~lavm
>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> -- 
>>>>> The University of Edinburgh is a charitable body, registered in
>>>>> Scotland, with registration number SC005336.
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> ----
>>>> Ivan Herman, W3C Semantic Web Activity Lead
>>>> Home: http://www.w3.org/People/Ivan/
>>>> mobile: +31-641044153
>>>> FOAF: http://www.ivan-herman.net/foaf.rdf
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>> 
>>> 
>>> -- 
>>> The University of Edinburgh is a charitable body, registered in
>>> Scotland, with registration number SC005336.
>>> 
>> 
>> 
>> ----
>> Ivan Herman, W3C Semantic Web Activity Lead
>> Home: http://www.w3.org/People/Ivan/
>> mobile: +31-641044153
>> FOAF: http://www.ivan-herman.net/foaf.rdf
>> 
>> 
>> 
>> 
>> 
>> 
> 
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.


----
Ivan Herman, W3C Semantic Web Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
FOAF: http://www.ivan-herman.net/foaf.rdf
Received on Friday, 10 August 2012 12:04:42 UTC