- From: Luc Moreau <l.moreau@ecs.soton.ac.uk>
- Date: Fri, 10 Aug 2012 14:09:36 +0100
- To: James Cheney <jcheney@inf.ed.ac.uk>
- CC: public-prov-wg@w3.org
I quite like '(PROV) document' Luc PS tracker, this is ISSUE-477 too On 10/08/12 13:10, James Cheney wrote: > Hi Luc, > > I agree, it would definitely be better to avoid changes that affect other last call documents; see my response to Ivan just now. > > I think the simplest solution is to find a neutral name instead of "dataset", and use that; if it eventually becomes clear that it is safe to use "dataset" we can align. > > Here is a concrete proposal. > > 1. Instead of "PROV dataset" for the toplevel/default instance + bundles, use "PROV document" > > 2. To address issue 477, start a provn document with "provenance" instead of "bundle", and end it with "endProvenance". (This might be nicer for magic number reasons, so that the first four bytes of the document make it easy to tell the type, as long as they're not taken. But this doesn't matter that much since there is a MIME type, I guess.) > > Any objection to "PROV document"? > > This is ISSUE-477 too. > > --James > > On Aug 10, 2012, at 12:46 PM, Luc Moreau wrote: > >> Hi James, >> I would be worried about this proposed change since it affects prov-dm/prov-o where we >> define a bundle as a named set of assertions. We create bundles because we want >> to talk about the provenance of these assertions. So, a name is an intrinsic element of a bundle. >> >> With hindsight, 'toplevel-bundle' was a poor choice of word, since it doesn't have a name. >> This concept is something that we need in prov-n, because we need to know how to organize statements, and prefix declarations, etc. >> >> Introducing unnamed bundle and named bundle in prov-dm is not desirable because prov-dm doesn't say anything about unamed bundles. Why introduce them? >> >> Luc >> >> >> >> On 10/08/12 12:39, James Cheney wrote: >>> On Aug 10, 2012, at 12:20 PM, Ivan Herman wrote: >>> >>>> Just a side-issue: >>>> >>>> On Aug 10, 2012, at 09:40 , James Cheney wrote: >>>>> [snip] >>>>> Part of Simon's point was that what I was calling a toplevel-bundle is not really a bundle, just a set of statements. What prov-n is calling a toplevel-bundle is also not really a bundle: it might have multiple bundles or none, along with an unnamed set of statements. So the terminology is confusing (I agree). >>>>> >>>>> So I expect Simon might suggest that we avoid the use of toplevel-bundle in prov-n too; if he doesn't, I will. Calling it a dataset would be fine. >>>>> >>>> That would lead to a possible confusion. The term 'dataset' is used in the SW world, namely in SPARQL. It *may* be the term adopted by RDF 1.1 for a collection of named graphs and, actually, it *may* be the right abstraction for Prov, too, but... we are not yet sure. And if we end up using the same term but with a different meaning then, well, hell is loose:-) >>>> >>> OK. I had the impression that in RDF terms, a PROV instance would literally correspond to a graph, a bundle would roughly correspond to a named graph, and so saying "toplevel bundle" seemed odd since it doesn't have a name. Then a "PROV dataset" would, if represented in RDF, literally be (an example of) a RDF dataset. >>> >>> But that is a fair point. Can we make this conditional on staying aligned with RDF terminology? I'd rather not invent three new terms for things that directly align with existing terminology, but I appreciate the concern that we not use the same terminology for slightly different things. So any suggestions for a better term than "dataset" would be welcome. >>> >>> >>>> B.t.w., if I use the RDF datasets as an analogy: that consists of (G, (n1,G1),....,(ni,Gi)), where (ni,Gi) is, to use the current terminology, a named graph (that is the term used in SPARQL) and G is the 'default graph'. As an analogy, what about 'default bundle' ? >>>> >>> Default bundle might be better than toplevel bundle, but for us, "bundle" has generally meant "named set of statements". >>> >>> So another terminology could be: >>> >>> - "instance" - the whole thing (toplevel bundle + named bundles) >>> - "bundle" - any set of statements (named or unnamed) >>> - "named bundle" - >>> - "default bundle" - the unnamed set of statements at the toplevel (what we were calling "toplevel bundle" or "toplevel instance" in my recent revision). >>> >>> Simon, would that be acceptable instead of "dataset", "instance", "bundle", "toplevel instance"? Are there lots of places where we say that bundles are named, that would have to change to draw this distinction? >>> >>> One advantage of this would be that prov-n doesn't need to change (except maybe renaming "toplevel" to "default"). >>> >>> I think we can probably keep this change independent of technical content, so that we can align (or not) with RDF 1.1 later, in any case. >>> >>> --James >>> >>> >>> >>>> Ivan >>>> >>>> >>>>>> >>>>>> Isn't it the case that an instance (which is a prov-constraint concept and not a prov-n concept) >>>>>> a set of statement or a bundle or a toplevel-bundle/dataset? >>>>> I am now proposing that we use "instance" solely for "set of statements". If this term is only used in this sense in prov-constraints, then it seems that we are free to redefine it, within reason. Most of the document concerns instances, so the number of changes was small. For cohesion, if we talk about sets of statements elsewhere it might be sensible to call them "instances", but I don't insist on it, nor do I insist on the use of "dataset" elsewhere. >>>>> >>>>> --James >>>>> >>>>> >>>>>> Luc >>>>>> >>>>>> >>>>>> On 09/08/12 18:03, James Cheney wrote: >>>>>>> OK. I have done a quick pass to use the term "PROV dataset" and changed all occurrences of "toplevel bundle" to "toplevel instance". I think it's a lot better this way! >>>>>>> >>>>>>> instance = named set of statements. (Excluding "bundle" constructs, which are not statements.) >>>>>>> bundle = named set of statements ~= named graph of PROV-O (hopefully!) >>>>>>> dataset = an instance and zero or more bundles (with distinct names). >>>>>>> toplevel instance = the set of statements at the toplevel of a dataset >>>>>>> >>>>>>> Module typos/snags, does this look OK? If so I will close. >>>>>>> >>>>>>> Perhaps this terminology would be useful in other documents (Luc pointed out PROV-N uses "toplevel bundle" too...). >>>>>>> >>>>>>> --James >>>>>>> >>>>>>> On Aug 9, 2012, at 5:41 PM, Miles, Simon wrote: >>>>>>> >>>>>>>> Hello James, >>>>>>>> >>>>>>>> I strongly agree with the suggested general solution. I have no objection to "dataset" as a term. If you do still need to talk about bundles at all in PROV-Constraints, I think it should be made clear that the "toplevel" does not need to be named (does not need to be a bundle) to avoid confusion of concepts for different purposes. >>>>>>>> >>>>>>>> As said on the IRC, I don't think this is a blocking issue, just a matter of text clarification. >>>>>>>> >>>>>>>> thanks, >>>>>>>> Simon >>>>>>>> >>>>>>>> Dr Simon Miles >>>>>>>> Senior Lecturer, Department of Informatics >>>>>>>> Kings College London, WC2R 2LS, UK >>>>>>>> +44 (0)20 7848 1166 >>>>>>>> >>>>>>>> Evolutionary Testing of Autonomous Software Agents: >>>>>>>> http://eprints.dcs.kcl.ac.uk/1370/ >>>>>>>> ________________________________________ >>>>>>>> From: James Cheney [jcheney@inf.ed.ac.uk] >>>>>>>> Sent: 09 August 2012 17:21 >>>>>>>> To: Provenance Working Group >>>>>>>> Subject: Re: PROV-ISSUE-474 (instances-and-bundles): Bundles and valid instances [prov-dm-constraints] >>>>>>>> >>>>>>>> We discussed this in the teleconference and it sounded like it would be appropriate to find better terminology for the following three things, which are currently not clearly distinguished: >>>>>>>> >>>>>>>> - "the whole PROV instance, including set of toplevel statements and bundles" >>>>>>>> - "a particular set of statements, either the toplevel one or one within a bundle" >>>>>>>> - bundle = "a named set of provenance statements" >>>>>>>> >>>>>>>> My initial proposal is "PROV dataset", "PROV instance", and "bundle". I believe "PROV dataset" is roughly analogous to what people call "dataset" in the context of SPARQL; if anyone knows different (or has objections or better suggestions), let me know. >>>>>>>> >>>>>>>> I'll send another message on this when this is ready for review. >>>>>>>> >>>>>>>> --James >>>>>>>> >>>>>>>> On Aug 9, 2012, at 3:45 PM, Provenance Working Group Issue Tracker wrote: >>>>>>>> >>>>>>>>> PROV-ISSUE-474 (instances-and-bundles): Bundles and valid instances [prov-dm-constraints] >>>>>>>>> >>>>>>>>> http://www.w3.org/2011/prov/track/issues/474 >>>>>>>>> >>>>>>>>> Raised by: Simon Miles >>>>>>>>> On product: prov-dm-constraints >>>>>>>>> >>>>>>>>> As requested, I'm submitting an issue where I feel a PROV-Constraints review comment of mine is not completely answered. >>>>>>>>> >>>>>>>>> My original comment: >>>>>>>>>> Bundles >>>>>>>>>> ------- >>>>>>>>>> F. Section 6.1 seems a bit out of the blue. "The definitions >>>>>>>>>> [etc.]... assume a PROV instance with exactly one bundle", and then >>>>>>>>>> multiple bundles are handled as exactly the same number of >>>>>>>>>> instances. Why? Why is there a connection between number of instances >>>>>>>>>> and number of bundles? Why would a bundle be considered to be only one >>>>>>>>>> instance? I thought a bundle was an identified set of statements, >>>>>>>>>> allowing for provenance of provenance, which seems a distinct matter >>>>>>>>>> from whether a set of statements are valid. It seems fine for a user >>>>>>>>>> to treat one bundle as one instance if they want to, but there's no >>>>>>>>>> reason given why this is the general case. >>>>>>>>> Response from editors: >>>>>>>>>> I am not sure I understand this comment. However, I have rewritten >>>>>>>>>> slightly the intro of section 6.1. >>>>>>>>>> >>>>>>>>>> "The definitions, inferences, and constraints, and the resulting notions of normalization, validity and equivalence, assume a PROV instance that consists of exactly one bundle, the toplevel bundle, containing all PROV statements in the top level of the bundle (that is, not enclosed in a named bundle). In this section, we describe how to deal with PROV instances consisting of multiple named bundles. Briefly, each bundle is handled independently; there is no interaction between bundles from the perspective of applying definitions, inferences, or constraints, computing normal forms, or checking validity or equivalence." >>>>>>>>> I agree this is clearer, but I don't feel it answers the key questions in my comment. To put my comment another way: you have explained checking validity where an instance consists of one bundle and of multiple bundles. The two other possibilities I see are: >>>>>>>>> (a) A bundle containing multiple instances; >>>>>>>>> (b) An instance that is a collection of PROV descriptions with no identifier and so is not a bundle, e.g. a provenance service query result. >>>>>>>>> >>>>>>>>> How do we deal with each of these cases? Or, if they cannot occur, why not? >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Simon >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> -- >>>>>>>> The University of Edinburgh is a charitable body, registered in >>>>>>>> Scotland, with registration number SC005336. >>>>>>>> >>>>>> -- >>>>>> Professor Luc Moreau >>>>>> Electronics and Computer Science tel: +44 23 8059 4487 >>>>>> University of Southampton fax: +44 23 8059 2865 >>>>>> Southampton SO17 1BJ email: l.moreau@ecs.soton.ac.uk >>>>>> United Kingdom http://www.ecs.soton.ac.uk/~lavm >>>>>> >>>>>> >>>>>> >>>>> -- >>>>> The University of Edinburgh is a charitable body, registered in >>>>> Scotland, with registration number SC005336. >>>>> >>>>> >>>> ---- >>>> Ivan Herman, W3C Semantic Web Activity Lead >>>> Home: http://www.w3.org/People/Ivan/ >>>> mobile: +31-641044153 >>>> FOAF: http://www.ivan-herman.net/foaf.rdf >>>> >>>> >>>> >>>> >>>> >>>> >>>> >> -- >> Professor Luc Moreau >> Electronics and Computer Science tel: +44 23 8059 4487 >> University of Southampton fax: +44 23 8059 2865 >> Southampton SO17 1BJ email: l.moreau@ecs.soton.ac.uk >> United Kingdom http://www.ecs.soton.ac.uk/~lavm >> >> >> > -- Professor Luc Moreau Electronics and Computer Science tel: +44 23 8059 4487 University of Southampton fax: +44 23 8059 2865 Southampton SO17 1BJ email: l.moreau@ecs.soton.ac.uk United Kingdom http://www.ecs.soton.ac.uk/~lavm
Received on Friday, 10 August 2012 13:10:23 UTC