Re: PROV Last Call - RDF WG review request from David Wood on 2012-10-16 (public-rdf-wg@w3.org from October 2012)

From: David Wood <david@3roundstones.com>
Date: Tue, 16 Oct 2012 09:09:31 -0400
To: Antoine Zimmermann <antoine.zimmermann@emse.fr>
Cc: "public-rdf-wg@w3.org" <public-rdf-wg@w3.org>, Richard Cyganiak <richard@cyganiak.de>, Ivan Herman <ivan@w3.org>
Message-Id: <53058AD2-E1FC-4EAC-A687-F9317F608D45@3roundstones.com>
On Oct 16, 2012, at 8:49, Antoine Zimmermann <antoine.zimmermann@emse.fr> wrote:

> Richard,
> 
> 
> I do not see why you are comparing bundles to gboxes. We are not standardising gboxes, we standardise RDF Datasets, which are unmutable structures. They are rigid and not changing, just like PROV bundles, RDF Graphs, OWL Ontologies or FOL formulas.
> Usually, given a set of formulas in a logical language, a term denotes one and only one thing (possible exception: punning) so of course a bundle IRI denotes only one thing.
> 
> But the question is whether the bundle identifier, when put inside a PROV document, is constrained to denote exactly the set of PROV statements that is inside the syntactic delimiter of the bundle. I do not see the evidence that it is the case.
> 
> I do not see anything that would contradict that:
> 
> bundle1 {
>  alternate(e1,e2)
> }
> 
> and:
> 
> bundle1 {
>  alternate(e2,e1)
> }
> 
> could be simultaneously valid and consistent with each others. I can accept that a bundle identifier identifies a unique set of POV statements, but not necessarily the one that is written in a PROV document.

Isn't the PROV WG saying that the bundle identifier identifies a specific bundle by fiat?

> 
> If I misread something and you can show me that this is inconsistent with the PROV spec, please point to the relevant parts.

Sorry (traveling). 

Regards,
Dave

> 
> BTW, the fact that PROV-C redefines in its own way and with its own terms all the notions they borrow from logics (sometimes in disagreement with the standard definitions) is not simplifying things, IMHO.
> 
> Just to give an example, the definition of "equivalence" is at odd with what equivalence is normally understood to mean. In PROV-C, there are instances that are not equivalent to themselves. I've never seen in my life a notion of equivalence, be it in logic, mathematic or otherwise, where equivalence is not reflexive. The reason, if you want to know, is that PROV-equivalence requires that the equivalent entities be both valid, which is PROV term to mean "consistent".
> 
> 
> AZ
> 
> 
> 
> Le 13/10/2012 16:10, Richard Cyganiak a écrit :
>> Antoine,
>> 
>> My point is that unlike g-boxes, bundles cannot change their contents
>> over time, and in that sense bundle names are rigid. This is
>> different from the designs that we have discussed, because in our
>> designs, we had the use case of wanting to associate graph IRIs with
>> the contents of the web document identified by the IRI, and the
>> contents of web documents change. PROV doesn't have this use case. A
>> bundle is a PROV-specific construct, and I couldn't see any
>> indication in any of the PROV specs that they think of the PROV
>> document that you get by dereferencing an IRI as a bundle with that
>> IRI as a name.
>> 
>> Now, about inference and all that stuff.
>> 
>> PROV-C doesn't define a notion for the truth of, or entailment
>> between, PROV documents. This simplifies things considerably.
>> 
>> PROV-C *does* define a notion of *equivalence* for PROV documents.
>> The definition requires that the bundle contents be *equivalent*.
>> They don't have to be the same syntactic statements. Bundle contents
>> (sets of PROV statements, a.k.a. PROV instances) are equivalent if
>> they have the same normal form. If you “materialize all inferences”
>> in a PROV instance, you get a new and different PROV instance that
>> still has the same normal form.
>> 
>> So, in PROV semantics translated to RDF datasets, :g { G1 } and :g {
>> G2 } are equivalent if G1 and G2 are equivalent. The question whether
>> such pairs entail each other is meaningless in PROV semantics.
>> 
>> The PROV design thus actually pretty closely mirrors our resolution
>> that essentially datasets are not logical expressions, but syntactic
>> containers for logical expressions.
>> 
>> Best, Richard
>> 
>> 
>> 
>> On 12 Oct 2012, at 18:42, Antoine Zimmermann wrote:
>> 
>>> Ok, let us just assume that PROV statements are like RDF statements
>>> and avoid this distinction. In this case, I would still disagree
>>> that the graph IRI would denote the set of statements, for various
>>> reasons, including what is written in PROV-CONSTRAINTS (later
>>> abbreviated as PROV-C). In PROV-C, it is said:
>>> 
>>> "When processing provenance, an application may apply the
>>> inferences and definitions in section 4. Definitions and
>>> Inferences."
>>> 
>>> What "apply" means is not formally defined, but from the
>>> informational section 2, I understand that "applying" an inference
>>> means adding the inferred statements to the instance:
>>> 
>>> "we can often /apply/ the formula to the instance to produce
>>> another instance that does satisfy the formula"
>>> 
>>> "The process of applying definitions, inferences, and constraints
>>> to a PROV instance until all of them are satisfied is similar to
>>> what is sometimes called /chasing/ [DBCONSTRAINTS] or /saturation/
>>> [CHR]."
>>> 
>>> PROV-C also says that inferences are done on a per-bundle basis.
>>> So, I understand the PROV-C spec to be saying that a bundle is
>>> equivalent to a bundle that contain inferred statements. This seems
>>> to clash with the idea that the graph IRI denote the actual set
>>> inside the written bundle and nothing else.
>>> 
>>> Any implementation that would materialise the inferred PROV
>>> statements according to the PROV-CONSTRAINTS rules would be doing
>>> something incorrect if the graph IRI denoted the actual set of
>>> statements.
>>> 
>>> Yet, I understand that, provenance-whise, one would like to
>>> distinguish a raw provenance and a
>>> provenance-with-materialised-inferences. In which case, there is
>>> something more needed.
>>> 
>>> 
>>> AZ.
>>> 
>>> 
>>> Le 12/10/2012 17:54, Richard Cyganiak a écrit :
>>>> On 12 Oct 2012, at 17:12, Antoine Zimmermann wrote:
>>>>> I'm not sure whether they actually want the "name" to denote
>>>>> the graph.
>>>> 
>>>> I'm pretty sure they do.
>>>> 
>>>> AFAICT, PROV has a general philosophy that goes like, “if it
>>>> changes, it's a new entity”, and I read their spec as saying that
>>>> bundle names are really meant to be rigidly connected to a
>>>> particular set of provenance descriptions.
>>>> 
>>>> Whether these provenance descriptions are expressed as triples
>>>> or PROV-N assertions seems secondary and interchangeable.
>>>> 
>>>> This doesn't mean that the “static g-box” approach wouldn't have
>>>> worked for them.
>>>> 
>>>> Best, Richard
>>>> 
>>>> 
>>>>> They certainly want it to denote a "bundle", which indeed will
>>>>> /contain/ RDF triples, but may be distinct from an RDF graph
>>>>> (especially since "bundles" do not consist of triples in the
>>>>> abstract syntax). At least, it is the way I interpret it, and
>>>>> it is the way I would like it to be. It gives more flexibility
>>>>> as the graph IRI is not rigidly fixed to the exact set of
>>>>> triples providing in a particular RDF dataset. With this view,
>>>>> it is even less a problem that we do not tell them what the
>>>>> graph IRI denotes.
>>>>> 
>>>>> 
>>>>> AZ.
>>>>> 
>>>>> 
>>>>> 
>>>>>> 
>>>>>> Best, Richard
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>> Pat
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Funny enough, PROV-O has some examples that use TriG
>>>>>>>>> syntax. They don't say what the syntax is, and don't
>>>>>>>>> reference any spec that defines the syntax -- they
>>>>>>>>> just provide the examples without comment on the
>>>>>>>>> syntax.
>>>>>>>> 
>>>>>>>> That has already been raised as an issue on the LC
>>>>>>>> documents (by me:-) and these will disappear in the CR
>>>>>>>> version of the document.
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> Best, Richard
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> AZ.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> -- Sandro
>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> AZ.
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> -- Sandro
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> -AZ
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Pat
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> from using named graphs and RDF datasets
>>>>>>>>>>>>>>>> for their bundle. But it's quite the
>>>>>>>>>>>>>>>> opposite: we have voted for the absence
>>>>>>>>>>>>>>>> of constraints!
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> So they can use the RDF dataset data
>>>>>>>>>>>>>>>> structure the way they want. They simply
>>>>>>>>>>>>>>>> have to be warned that they should not
>>>>>>>>>>>>>>>> assume any particular meaning for a
>>>>>>>>>>>>>>>> dataset. Therefore, if they want to use
>>>>>>>>>>>>>>>> this for bundles, they'll have to
>>>>>>>>>>>>>>>> completely describe all the constraints
>>>>>>>>>>>>>>>> they require when defining a provenance
>>>>>>>>>>>>>>>> dataset. Whatever constraints they define
>>>>>>>>>>>>>>>> will be consistent with the RDF specs,
>>>>>>>>>>>>>>>> since our set of constraints regarding
>>>>>>>>>>>>>>>> datasets is empty.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> So, I'd have no problem telling them to
>>>>>>>>>>>>>>>> go ahead and use datasets, and be
>>>>>>>>>>>>>>>> specific in what it means in the context
>>>>>>>>>>>>>>>> of provenance data.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> --AZ
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Le 05/10/2012 05:40, Pat Hayes a écrit :
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On Oct 4, 2012, at 3:24 PM, David Wood
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Hi Pat,
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> On Oct 4, 2012, at 15:55, Pat
>>>>>>>>>>>>>>>>>> Hayes<phayes@ihmc.us>    wrote:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> David, greetings.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> I have been waiting for the WG to
>>>>>>>>>>>>>>>>>>> make a decision about datasets and
>>>>>>>>>>>>>>>>>>> named graphs before getting back to
>>>>>>>>>>>>>>>>>>> the PROV group, as this is the most
>>>>>>>>>>>>>>>>>>> relevant to their 'bundle' feature.
>>>>>>>>>>>>>>>>>>> As far as I can see, our recent
>>>>>>>>>>>>>>>>>>> decision to gove no semantics to
>>>>>>>>>>>>>>>>>>> datasets means that we contribute
>>>>>>>>>>>>>>>>>>> nothing to this, and the PROV group
>>>>>>>>>>>>>>>>>>> are on their own to invent their
>>>>>>>>>>>>>>>>>>> own graph naming construct and give
>>>>>>>>>>>>>>>>>>> it the semantics they want,
>>>>>>>>>>>>>>>>>>> independently from the output of
>>>>>>>>>>>>>>>>>>> this WG.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Do you concur?
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Hmm. A bundle is "a named set of
>>>>>>>>>>>>>>>>>> descriptions, but it is also an
>>>>>>>>>>>>>>>>>> entity so that its provenance can be
>>>>>>>>>>>>>>>>>> described." [1] A SPARQL dataset
>>>>>>>>>>>>>>>>>> "represents a collection of graphs"
>>>>>>>>>>>>>>>>>> and "comprises one graph, the default
>>>>>>>>>>>>>>>>>> graph, which does not have a name,
>>>>>>>>>>>>>>>>>> and zero or more named graphs, where
>>>>>>>>>>>>>>>>>> each named graph is identified by an
>>>>>>>>>>>>>>>>>> IRI." [2]
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> There is clearly overlap there, but
>>>>>>>>>>>>>>>>>> I don't think the overlap is anywhere
>>>>>>>>>>>>>>>>>> near complete. It doesn't appear that
>>>>>>>>>>>>>>>>>> the WG is willing to equate a "named
>>>>>>>>>>>>>>>>>> set of descriptions" with a
>>>>>>>>>>>>>>>>>> "collection of graphs" nor to
>>>>>>>>>>>>>>>>>> presuppose some way to then give the
>>>>>>>>>>>>>>>>>> dataset a name via an IRI.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Right. And it seems to me that it is
>>>>>>>>>>>>>>>>> the second part that really matters. In
>>>>>>>>>>>>>>>>> their original request for comment
>>>>>>>>>>>>>>>>> they particularly mentioned named
>>>>>>>>>>>>>>>>> graphs as a topic of interest in
>>>>>>>>>>>>>>>>> connection with bundles, and I took
>>>>>>>>>>>>>>>>> them to be interested in the
>>>>>>>>>>>>>>>>> possibility that named graphs could be
>>>>>>>>>>>>>>>>> used to construct bundles or implement
>>>>>>>>>>>>>>>>> them in RDF in a natural way. I think,
>>>>>>>>>>>>>>>>> now, the only possible answer is, no.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> So, it appears to me that we have
>>>>>>>>>>>>>>>>>> problems with the PROV-DM document's
>>>>>>>>>>>>>>>>>> definition of a Bundle from at least
>>>>>>>>>>>>>>>>>> two perspectives: We don't have
>>>>>>>>>>>>>>>>>> semantics for datasets, nor do we
>>>>>>>>>>>>>>>>>> have a syntax that we could equate to
>>>>>>>>>>>>>>>>>> a bundle.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> I don't think they were expecting to
>>>>>>>>>>>>>>>>> find a ready-made bundle in RDF, but
>>>>>>>>>>>>>>>>> there is now nothing in RDF which would
>>>>>>>>>>>>>>>>> even be of utility or help in creating
>>>>>>>>>>>>>>>>> bundles, AFAIKS. They will have to
>>>>>>>>>>>>>>>>> define their own extension to RDF and
>>>>>>>>>>>>>>>>> give it a purpose-built semantics of
>>>>>>>>>>>>>>>>> their own.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Pat
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> TriG (as currently conceptualized)
>>>>>>>>>>>>>>>>>> could provide a syntax for a bundle
>>>>>>>>>>>>>>>>>> iff we decide to adopt some way to
>>>>>>>>>>>>>>>>>> name the package itself (as some
>>>>>>>>>>>>>>>>>> extant systems do, by assigning an
>>>>>>>>>>>>>>>>>> IRI upon ingest). I think both of
>>>>>>>>>>>>>>>>>> those rather unlikely at this time,
>>>>>>>>>>>>>>>>>> although I don't think implementors
>>>>>>>>>>>>>>>>>> will cease doing so (because it is
>>>>>>>>>>>>>>>>>> useful).
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Of course, I could be wrong since my
>>>>>>>>>>>>>>>>>> reading is still incomplete.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Regards, Dave
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>>> http://www.w3.org/TR/prov-dm/#term-bundle-entity
>>> 
> [2]
>>>>>>>>>>>>>>>>>> http://www.w3.org/TR/sparql11-query/#rdfDataset
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>> 
>>> 
> Pat
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> On Oct 4, 2012, at 2:33 PM, David
>>>>>>>>>>>>>>>>>>> Wood wrote:
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Thanks, Paul. We'll get back to
>>>>>>>>>>>>>>>>>>>> you shortly, hopefully prior to
>>>>>>>>>>>>>>>>>>>> your 10 Oct deadline.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Regards, Dave
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> On Oct 4, 2012, at 14:52, Paul
>>>>>>>>>>>>>>>>>>>> Groth<p.t.groth@vu.nl>   wrote:
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Hi Dave,
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> We had specific questions
>>>>>>>>>>>>>>>>>>>>> about PROV-DM and PROV-O that
>>>>>>>>>>>>>>>>>>>>> we are keen on getting
>>>>>>>>>>>>>>>>>>>>> answered.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> From the email to the RDF WG
>>>>>>>>>>>>>>>>>>>>> chains on July 24, 2012:
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> "We particularly wanted to
>>>>>>>>>>>>>>>>>>>>> call your attention to the
>>>>>>>>>>>>>>>>>>>>> Bundle feature [5].
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Questions we have are: - We
>>>>>>>>>>>>>>>>>>>>> are hopeful that the notion of
>>>>>>>>>>>>>>>>>>>>> Bundle should map to the notion
>>>>>>>>>>>>>>>>>>>>> of graph you are defining. Can
>>>>>>>>>>>>>>>>>>>>> you look into this? - In
>>>>>>>>>>>>>>>>>>>>> particular, with respect to
>>>>>>>>>>>>>>>>>>>>> Bundle do you see the
>>>>>>>>>>>>>>>>>>>>> construct Mention[6] as
>>>>>>>>>>>>>>>>>>>>> compatible with RDF now and
>>>>>>>>>>>>>>>>>>>>> going forward - PROV-DM is
>>>>>>>>>>>>>>>>>>>>> dependent on rdf types[7]. Do
>>>>>>>>>>>>>>>>>>>>> you envisage any further
>>>>>>>>>>>>>>>>>>>>> changes in the rdf data types?
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> In addition, any feedback on
>>>>>>>>>>>>>>>>>>>>> the PROV-Ontology document is
>>>>>>>>>>>>>>>>>>>>> greatly appreciated."
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Similarly, in prov-constraints
>>>>>>>>>>>>>>>>>>>>> we wondered about Bundle and
>>>>>>>>>>>>>>>>>>>>> specifically terminology of
>>>>>>>>>>>>>>>>>>>>> Document and Bundle work with
>>>>>>>>>>>>>>>>>>>>> terms you will use in RDF. For
>>>>>>>>>>>>>>>>>>>>> example, I have heard that the
>>>>>>>>>>>>>>>>>>>>> term dataset will be used.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> We are keen on getting feedback
>>>>>>>>>>>>>>>>>>>>> as soon as possible so that are
>>>>>>>>>>>>>>>>>>>>> CR document is in-line with
>>>>>>>>>>>>>>>>>>>>> what is forthcoming in RDF.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Thanks Paul
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> On Thu
Received on Tuesday, 16 October 2012 13:10:18 UTC