Re: PROV-ISSUE-474 (instances-and-bundles): Bundles and valid instances [prov-dm-constraints]

OK, so the prov-constraints document up to section 6.2 only discusses
prov instances, i.e. unnamed sets of statements.

Section 6.2 shows how to extend the validation procedure to
bundles (named sets of statements) and datasets (multiple bundles or none,
along with an unnamed set of statement).

I am fine with this. This was my understanding.

Luc



On 10/08/12 08:40, James Cheney wrote:
> On Aug 10, 2012, at 5:25 AM, Luc Moreau wrote:
>
>> Hi James,
>> I don't think it's the clearest.
>> Are both bundle and instance are named set of statements?
>> I wouldnt know which of these terms to use in prov-n.
> Argh.  Writing too many emails too fast at the same time, I saw that I wrote:
>
>>> instance = named set of statements.  (Excluding "bundle" constructs, which are not statements.)
> but this should be:
>
> instance = unnamed set of statements.  (Excluding "bundle" constructs, which are not statements.)
>
>
>> prov-n has :
>> - statements:  e.g. entity(e), wasGeneratedBy(a,e)
>> - a construct bundle, which gives a name to a set of statements
>> - a construct toplevel-bundle, which combines a set of statements, and bundles
>>
>> Are you suggesting to rename toplevel-bundle to dataset?
>
> Part of Simon's point was that what I was calling a toplevel-bundle is not really a bundle, just a set of statements.  What prov-n is calling a toplevel-bundle is also not really a bundle: it might have multiple bundles or none, along with an unnamed set of statements.  So the terminology is confusing (I agree).
>
> So I expect Simon might suggest that we avoid the use of toplevel-bundle in prov-n too; if he doesn't, I will.  Calling it a dataset would be fine.
>
>>
>>
>> Isn't it the case that an instance (which is a prov-constraint concept and not a prov-n concept)
>> a set of statement or a bundle or a toplevel-bundle/dataset?
> I am now proposing that we use "instance" solely for "set of statements".  If this term is only used in this sense in prov-constraints, then it seems that we are free to redefine it, within reason.  Most of the document concerns instances, so the number of changes was small. For cohesion, if we talk about sets of statements elsewhere it might be sensible to call them "instances", but I don't insist on it, nor do I insist on the use of "dataset" elsewhere.
>
> --James
>
>
>> Luc
>>
>>
>> On 09/08/12 18:03, James Cheney wrote:
>>> OK.  I have done a quick pass to use the term "PROV dataset" and changed all occurrences of "toplevel bundle" to "toplevel instance".  I think it's a lot better this way!
>>>
>>> instance = named set of statements.  (Excluding "bundle" constructs, which are not statements.)
>>> bundle = named set of statements ~= named graph of PROV-O (hopefully!)
>>> dataset = an instance and zero or more bundles (with distinct names).
>>> toplevel instance = the set of statements at the toplevel of a dataset
>>>
>>> Module typos/snags, does this look OK?  If so I will close.
>>>
>>> Perhaps this terminology would be useful in other documents (Luc pointed out PROV-N uses "toplevel bundle" too...).
>>>
>>> --James
>>>
>>> On Aug 9, 2012, at 5:41 PM, Miles, Simon wrote:
>>>
>>>> Hello James,
>>>>
>>>> I strongly agree with the suggested general solution. I have no objection to "dataset" as a term. If you do still need to talk about bundles at all in PROV-Constraints, I think it should be made clear that the "toplevel" does not need to be named (does not need to be a bundle) to avoid confusion of concepts for different purposes.
>>>>
>>>> As said on the IRC, I don't think this is a blocking issue, just a matter of text clarification.
>>>>
>>>> thanks,
>>>> Simon
>>>>
>>>> Dr Simon Miles
>>>> Senior Lecturer, Department of Informatics
>>>> Kings College London, WC2R 2LS, UK
>>>> +44 (0)20 7848 1166
>>>>
>>>> Evolutionary Testing of Autonomous Software Agents:
>>>> http://eprints.dcs.kcl.ac.uk/1370/
>>>> ________________________________________
>>>> From: James Cheney [jcheney@inf.ed.ac.uk]
>>>> Sent: 09 August 2012 17:21
>>>> To: Provenance Working Group
>>>> Subject: Re: PROV-ISSUE-474 (instances-and-bundles): Bundles and valid instances [prov-dm-constraints]
>>>>
>>>> We discussed this in the teleconference and it sounded like it would be appropriate to find better terminology for the following three things, which are currently not clearly distinguished:
>>>>
>>>> - "the whole PROV instance, including set of toplevel statements and bundles"
>>>> - "a particular set of statements, either the toplevel one or one within a bundle"
>>>> - bundle = "a named set of provenance statements"
>>>>
>>>> My initial proposal is "PROV dataset", "PROV instance", and "bundle".  I believe "PROV dataset" is roughly analogous to what people call "dataset" in the context of SPARQL; if anyone knows different (or has objections or better suggestions), let me know.
>>>>
>>>> I'll send another message on this when this is ready for review.
>>>>
>>>> --James
>>>>
>>>> On Aug 9, 2012, at 3:45 PM, Provenance Working Group Issue Tracker wrote:
>>>>
>>>>> PROV-ISSUE-474 (instances-and-bundles): Bundles and valid instances [prov-dm-constraints]
>>>>>
>>>>> http://www.w3.org/2011/prov/track/issues/474
>>>>>
>>>>> Raised by: Simon Miles
>>>>> On product: prov-dm-constraints
>>>>>
>>>>> As requested, I'm submitting an issue where I feel a PROV-Constraints review comment of mine is not completely answered.
>>>>>
>>>>> My original comment:
>>>>>> Bundles
>>>>>> -------
>>>>>> F. Section 6.1 seems a bit out of the blue. "The definitions
>>>>>> [etc.]... assume a PROV instance with exactly one bundle", and then
>>>>>> multiple bundles are handled as exactly the same number of
>>>>>> instances. Why? Why is there a connection between number of instances
>>>>>> and number of bundles? Why would a bundle be considered to be only one
>>>>>> instance? I thought a bundle was an identified set of statements,
>>>>>> allowing for provenance of provenance, which seems a distinct matter
>>>>>> from whether a set of statements are valid. It seems fine for a user
>>>>>> to treat one bundle as one instance if they want to, but there's no
>>>>>> reason given why this is the general case.
>>>>> Response from editors:
>>>>>> I am not sure I understand this comment.  However, I have rewritten
>>>>>> slightly the intro of section 6.1.
>>>>>>
>>>>>> "The definitions, inferences, and constraints, and the resulting notions of normalization, validity and equivalence, assume a PROV instance that consists of exactly one bundle, the toplevel bundle, containing all PROV statements in the top level of the bundle (that is, not enclosed in a named bundle). In this section, we describe how to deal with PROV instances consisting of multiple named bundles. Briefly, each bundle is handled independently; there is no interaction between bundles from the perspective of applying definitions, inferences, or constraints, computing normal forms, or checking validity or equivalence."
>>>>> I agree this is clearer, but I don't feel it answers the key questions in my comment. To put my comment another way: you have explained checking validity where an instance consists of one bundle and of multiple bundles. The two other possibilities I see are:
>>>>> (a) A bundle containing multiple instances;
>>>>> (b) An instance that is a collection of PROV descriptions with no identifier and so is not a bundle, e.g. a provenance service query result.
>>>>>
>>>>> How do we deal with each of these cases? Or, if they cannot occur, why not?
>>>>>
>>>>> Thanks,
>>>>> Simon
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>> --
>>>> The University of Edinburgh is a charitable body, registered in
>>>> Scotland, with registration number SC005336.
>>>>
>> -- 
>> Professor Luc Moreau
>> Electronics and Computer Science   tel:   +44 23 8059 4487
>> University of Southampton          fax:   +44 23 8059 2865
>> Southampton SO17 1BJ               email: l.moreau@ecs.soton.ac.uk
>> United Kingdom                     http://www.ecs.soton.ac.uk/~lavm
>>
>>
>>
>

-- 
Professor Luc Moreau
Electronics and Computer Science   tel:   +44 23 8059 4487
University of Southampton          fax:   +44 23 8059 2865
Southampton SO17 1BJ               email: l.moreau@ecs.soton.ac.uk
United Kingdom                     http://www.ecs.soton.ac.uk/~lavm

Received on Friday, 10 August 2012 07:55:51 UTC