Re: PROV-ISSUE-474 (instances-and-bundles): Bundles and valid instances [prov-dm-constraints] from James Cheney on 2012-08-14 (public-prov-wg@w3.org from August 2012)

From: James Cheney <jcheney@inf.ed.ac.uk>
Date: Tue, 14 Aug 2012 10:44:48 +0100
To: Graham Klyne <GK@ninebynine.org>
Cc: "Miles, Simon" <simon.miles@kcl.ac.uk>, Provenance Working Group <public-prov-wg@w3.org>
Message-Id: <BA2AB0A5-2D59-410B-B4A9-03A47046078E@inf.ed.ac.uk>
Hi Graham,


On Aug 13, 2012, at 9:20 AM, Graham Klyne wrote:

> James,
> 
> Mainly, I wanted to say that it will be very helpful if a PROV Dataset is structurally and semantically aligned with a SPARQL/RDF 1.1 Dataset.  (SPARQL defines no dataset semantics, but I understand the RDF 1.1 group have adopted the structure for "named graphs" in RDF, so will hopefully also define appropriate RDF semantics.)
> 

Yes, that was what I had in mind.  Ivan raised the point that using "Dataset" could be problematic if it turns out that what RDF means by Dataset isn't quite aligned.  For now, I have changed to "PROV Document" instead; hopefully we can rename back later, if appropriate.  I hope you are also happy with "document" as a compromise.  In any case, I expect a RDF dataset that contains PROV-O style RDF will be an example of a "PROV document".

> From this email, I find the distinction between "instance" and "bundle" to be unclear.  Also, when you say a "bundle" is not a "statement", what do you mean here by "statement" - I'm offline, can't check the source right now, so my apologies if this is covered in the document.  [later] I see that was a typo, but I'm still left wondering what you mean by "not a statement"

What we call a "statement" in prov-constraints is an (abstract) syntactic object that matches the "expression" production in prov-n.    So an "instance" is a set of (or an RDF graph in prov-o). "bundle" constructs do not match the production so are not part of an instance, just as, by analogy, a RDF named graph is not an RDF triple.

At some point we were using a mix of "statement" and "expression" for these things, and it got normalized to "statement" for reasons I do not recall.  Maybe we should change back to "expression" to match PROV-N.  I am happy with either.

--James

> 
> #g
> --
> 
> On 09/08/2012 18:03, James Cheney wrote:
>> OK.  I have done a quick pass to use the term "PROV dataset" and changed all occurrences of "toplevel bundle" to "toplevel instance".  I think it's a lot better this way!
>> 
>> instance = named set of statements.  (Excluding "bundle" constructs, which are not statements.)
>> bundle = named set of statements ~= named graph of PROV-O (hopefully!)
>> dataset = an instance and zero or more bundles (with distinct names).
>> toplevel instance = the set of statements at the toplevel of a dataset
>> 
>> Module typos/snags, does this look OK?  If so I will close.
>> 
>> Perhaps this terminology would be useful in other documents (Luc pointed out PROV-N uses "toplevel bundle" too...).
>> 
>> --James
>> 
>> On Aug 9, 2012, at 5:41 PM, Miles, Simon wrote:
>> 
>>> Hello James,
>>> 
>>> I strongly agree with the suggested general solution. I have no objection to "dataset" as a term. If you do still need to talk about bundles at all in PROV-Constraints, I think it should be made clear that the "toplevel" does not need to be named (does not need to be a bundle) to avoid confusion of concepts for different purposes.
>>> 
>>> As said on the IRC, I don't think this is a blocking issue, just a matter of text clarification.
>>> 
>>> thanks,
>>> Simon
>>> 
>>> Dr Simon Miles
>>> Senior Lecturer, Department of Informatics
>>> Kings College London, WC2R 2LS, UK
>>> +44 (0)20 7848 1166
>>> 
>>> Evolutionary Testing of Autonomous Software Agents:
>>> http://eprints.dcs.kcl.ac.uk/1370/
>>> ________________________________________
>>> From: James Cheney [jcheney@inf.ed.ac.uk]
>>> Sent: 09 August 2012 17:21
>>> To: Provenance Working Group
>>> Subject: Re: PROV-ISSUE-474 (instances-and-bundles): Bundles and valid instances [prov-dm-constraints]
>>> 
>>> We discussed this in the teleconference and it sounded like it would be appropriate to find better terminology for the following three things, which are currently not clearly distinguished:
>>> 
>>> - "the whole PROV instance, including set of toplevel statements and bundles"
>>> - "a particular set of statements, either the toplevel one or one within a bundle"
>>> - bundle = "a named set of provenance statements"
>>> 
>>> My initial proposal is "PROV dataset", "PROV instance", and "bundle".  I believe "PROV dataset" is roughly analogous to what people call "dataset" in the context of SPARQL; if anyone knows different (or has objections or better suggestions), let me know.
>>> 
>>> I'll send another message on this when this is ready for review.
>>> 
>>> --James
>>> 
>>> On Aug 9, 2012, at 3:45 PM, Provenance Working Group Issue Tracker wrote:
>>> 
>>>> PROV-ISSUE-474 (instances-and-bundles): Bundles and valid instances [prov-dm-constraints]
>>>> 
>>>> http://www.w3.org/2011/prov/track/issues/474
>>>> 
>>>> Raised by: Simon Miles
>>>> On product: prov-dm-constraints
>>>> 
>>>> As requested, I'm submitting an issue where I feel a PROV-Constraints review comment of mine is not completely answered.
>>>> 
>>>> My original comment:
>>>>> Bundles
>>>>> -------
>>>>> F. Section 6.1 seems a bit out of the blue. "The definitions
>>>>> [etc.]... assume a PROV instance with exactly one bundle", and then
>>>>> multiple bundles are handled as exactly the same number of
>>>>> instances. Why? Why is there a connection between number of instances
>>>>> and number of bundles? Why would a bundle be considered to be only one
>>>>> instance? I thought a bundle was an identified set of statements,
>>>>> allowing for provenance of provenance, which seems a distinct matter
>>>>> from whether a set of statements are valid. It seems fine for a user
>>>>> to treat one bundle as one instance if they want to, but there's no
>>>>> reason given why this is the general case.
>>>> 
>>>> Response from editors:
>>>>> I am not sure I understand this comment.  However, I have rewritten
>>>>> slightly the intro of section 6.1.
>>>>> 
>>>>> "The definitions, inferences, and constraints, and the resulting notions of normalization, validity and equivalence, assume a PROV instance that consists of exactly one bundle, the toplevel bundle, containing all PROV statements in the top level of the bundle (that is, not enclosed in a named bundle). In this section, we describe how to deal with PROV instances consisting of multiple named bundles. Briefly, each bundle is handled independently; there is no interaction between bundles from the perspective of applying definitions, inferences, or constraints, computing normal forms, or checking validity or equivalence."
>>>> 
>>>> I agree this is clearer, but I don't feel it answers the key questions in my comment. To put my comment another way: you have explained checking validity where an instance consists of one bundle and of multiple bundles. The two other possibilities I see are:
>>>> (a) A bundle containing multiple instances;
>>>> (b) An instance that is a collection of PROV descriptions with no identifier and so is not a bundle, e.g. a provenance service query result.
>>>> 
>>>> How do we deal with each of these cases? Or, if they cannot occur, why not?
>>>> 
>>>> Thanks,
>>>> Simon
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>> 
>>> 
>>> --
>>> The University of Edinburgh is a charitable body, registered in
>>> Scotland, with registration number SC005336.
>>> 
>> 
>> 
> 


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
Received on Tuesday, 14 August 2012 09:45:35 UTC