Re: PROV-ISSUE-474 (instances-and-bundles): Bundles and valid instances [prov-dm-constraints]

Just a side-issue:

On Aug 10, 2012, at 09:40 , James Cheney wrote:
> [snip]


> 
> Part of Simon's point was that what I was calling a toplevel-bundle is not really a bundle, just a set of statements.  What prov-n is calling a toplevel-bundle is also not really a bundle: it might have multiple bundles or none, along with an unnamed set of statements.  So the terminology is confusing (I agree).
> 
> So I expect Simon might suggest that we avoid the use of toplevel-bundle in prov-n too; if he doesn't, I will.  Calling it a dataset would be fine.
> 

That would lead to a possible confusion. The term 'dataset' is used in the SW world, namely in SPARQL. It *may* be the term adopted by RDF 1.1 for a collection of named graphs and, actually, it *may* be the right abstraction for Prov, too, but... we are not yet sure. And if we end up using the same term but with a different meaning then, well, hell is loose:-)

B.t.w., if I use the RDF datasets as an analogy: that consists of (G, (n1,G1),....,(ni,Gi)), where (ni,Gi) is, to use the current terminology, a named graph (that is the term used in SPARQL) and G is the 'default graph'. As an analogy, what about 'default bundle' ?

Ivan


>> 
>> 
>> 
>> Isn't it the case that an instance (which is a prov-constraint concept and not a prov-n concept)
>> a set of statement or a bundle or a toplevel-bundle/dataset?
> 
> I am now proposing that we use "instance" solely for "set of statements".  If this term is only used in this sense in prov-constraints, then it seems that we are free to redefine it, within reason.  Most of the document concerns instances, so the number of changes was small. For cohesion, if we talk about sets of statements elsewhere it might be sensible to call them "instances", but I don't insist on it, nor do I insist on the use of "dataset" elsewhere.
> 
> --James
> 
> 
>> 
>> Luc
>> 
>> 
>> On 09/08/12 18:03, James Cheney wrote:
>>> OK.  I have done a quick pass to use the term "PROV dataset" and changed all occurrences of "toplevel bundle" to "toplevel instance".  I think it's a lot better this way!
>>> 
>>> instance = named set of statements.  (Excluding "bundle" constructs, which are not statements.)
>>> bundle = named set of statements ~= named graph of PROV-O (hopefully!)
>>> dataset = an instance and zero or more bundles (with distinct names).
>>> toplevel instance = the set of statements at the toplevel of a dataset
>>> 
>>> Module typos/snags, does this look OK?  If so I will close.
>>> 
>>> Perhaps this terminology would be useful in other documents (Luc pointed out PROV-N uses "toplevel bundle" too...).
>>> 
>>> --James
>>> 
>>> On Aug 9, 2012, at 5:41 PM, Miles, Simon wrote:
>>> 
>>>> Hello James,
>>>> 
>>>> I strongly agree with the suggested general solution. I have no objection to "dataset" as a term. If you do still need to talk about bundles at all in PROV-Constraints, I think it should be made clear that the "toplevel" does not need to be named (does not need to be a bundle) to avoid confusion of concepts for different purposes.
>>>> 
>>>> As said on the IRC, I don't think this is a blocking issue, just a matter of text clarification.
>>>> 
>>>> thanks,
>>>> Simon
>>>> 
>>>> Dr Simon Miles
>>>> Senior Lecturer, Department of Informatics
>>>> Kings College London, WC2R 2LS, UK
>>>> +44 (0)20 7848 1166
>>>> 
>>>> Evolutionary Testing of Autonomous Software Agents:
>>>> http://eprints.dcs.kcl.ac.uk/1370/
>>>> ________________________________________
>>>> From: James Cheney [jcheney@inf.ed.ac.uk]
>>>> Sent: 09 August 2012 17:21
>>>> To: Provenance Working Group
>>>> Subject: Re: PROV-ISSUE-474 (instances-and-bundles): Bundles and valid instances [prov-dm-constraints]
>>>> 
>>>> We discussed this in the teleconference and it sounded like it would be appropriate to find better terminology for the following three things, which are currently not clearly distinguished:
>>>> 
>>>> - "the whole PROV instance, including set of toplevel statements and bundles"
>>>> - "a particular set of statements, either the toplevel one or one within a bundle"
>>>> - bundle = "a named set of provenance statements"
>>>> 
>>>> My initial proposal is "PROV dataset", "PROV instance", and "bundle".  I believe "PROV dataset" is roughly analogous to what people call "dataset" in the context of SPARQL; if anyone knows different (or has objections or better suggestions), let me know.
>>>> 
>>>> I'll send another message on this when this is ready for review.
>>>> 
>>>> --James
>>>> 
>>>> On Aug 9, 2012, at 3:45 PM, Provenance Working Group Issue Tracker wrote:
>>>> 
>>>>> PROV-ISSUE-474 (instances-and-bundles): Bundles and valid instances [prov-dm-constraints]
>>>>> 
>>>>> http://www.w3.org/2011/prov/track/issues/474
>>>>> 
>>>>> Raised by: Simon Miles
>>>>> On product: prov-dm-constraints
>>>>> 
>>>>> As requested, I'm submitting an issue where I feel a PROV-Constraints review comment of mine is not completely answered.
>>>>> 
>>>>> My original comment:
>>>>>> Bundles
>>>>>> -------
>>>>>> F. Section 6.1 seems a bit out of the blue. "The definitions
>>>>>> [etc.]... assume a PROV instance with exactly one bundle", and then
>>>>>> multiple bundles are handled as exactly the same number of
>>>>>> instances. Why? Why is there a connection between number of instances
>>>>>> and number of bundles? Why would a bundle be considered to be only one
>>>>>> instance? I thought a bundle was an identified set of statements,
>>>>>> allowing for provenance of provenance, which seems a distinct matter
>>>>>> from whether a set of statements are valid. It seems fine for a user
>>>>>> to treat one bundle as one instance if they want to, but there's no
>>>>>> reason given why this is the general case.
>>>>> Response from editors:
>>>>>> I am not sure I understand this comment.  However, I have rewritten
>>>>>> slightly the intro of section 6.1.
>>>>>> 
>>>>>> "The definitions, inferences, and constraints, and the resulting notions of normalization, validity and equivalence, assume a PROV instance that consists of exactly one bundle, the toplevel bundle, containing all PROV statements in the top level of the bundle (that is, not enclosed in a named bundle). In this section, we describe how to deal with PROV instances consisting of multiple named bundles. Briefly, each bundle is handled independently; there is no interaction between bundles from the perspective of applying definitions, inferences, or constraints, computing normal forms, or checking validity or equivalence."
>>>>> I agree this is clearer, but I don't feel it answers the key questions in my comment. To put my comment another way: you have explained checking validity where an instance consists of one bundle and of multiple bundles. The two other possibilities I see are:
>>>>> (a) A bundle containing multiple instances;
>>>>> (b) An instance that is a collection of PROV descriptions with no identifier and so is not a bundle, e.g. a provenance service query result.
>>>>> 
>>>>> How do we deal with each of these cases? Or, if they cannot occur, why not?
>>>>> 
>>>>> Thanks,
>>>>> Simon
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> --
>>>> The University of Edinburgh is a charitable body, registered in
>>>> Scotland, with registration number SC005336.
>>>> 
>>> 
>> 
>> -- 
>> Professor Luc Moreau
>> Electronics and Computer Science   tel:   +44 23 8059 4487
>> University of Southampton          fax:   +44 23 8059 2865
>> Southampton SO17 1BJ               email: l.moreau@ecs.soton.ac.uk
>> United Kingdom                     http://www.ecs.soton.ac.uk/~lavm
>> 
>> 
>> 
> 
> 
> -- 
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
> 
> 


----
Ivan Herman, W3C Semantic Web Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
FOAF: http://www.ivan-herman.net/foaf.rdf

Received on Friday, 10 August 2012 11:21:21 UTC