Re: PROV-ISSUE-79 (provenance-uri-contract): what is the contract associated with provenance-uris [Accessing and Querying Provenance] from Graham Klyne on 2011-09-01 (public-prov-wg@w3.org from September 2011)

From: Graham Klyne <GK@ninebynine.org>
Date: Thu, 01 Sep 2011 11:16:49 +0100
To: Luc Moreau <L.Moreau@ecs.soton.ac.uk>
CC: public-prov-wg@w3.org, Paul Groth <p.t.groth@vu.nl>
Message-ID: <4E5F5B91.4020009@ninebynine.org>
Luc,

I think this is an important topic, and my apologies that this response has bee 
a long time coming.  (I find I have to compartmentalize my time, or WG email 
discussions can too easily overwhelm me.)

In summary, I've adjusted the position made in my earlier response to Paul, but 
still feel there is an important role for invariance of Provenance assertions.

On 26/08/2011 09:28, Luc Moreau wrote:
>
> Waw! Graham,
>
> This is really a crucial point. Did you have in mind that provenance resources
> were not changing over time?

In the simple case, yes.  But my response to Paul was also allowing that 
information could be added monotonically.  I had understood that this was what 
gave rise to the key requirement that provenance is about *past* events - things 
that are not subject to further change.

> It's probably one of the key reasons where we take opposite views on the PAQ.

Hmm... yes, that would explain our failure to connect.

> I was working on the assumption that provenance resources were changing.
>
> I have two use cases where provenance of something is changing:
> - in PASOA, we used to have asynchronous recording of provenance, hence,
> some assertions might have been asserted, but
> might have not been recorded. When
> you were querying provenance, you could therefore get different results.

If I understand correctly, provenance assertions once made should never become 
invalid.  I think that's the point of all our discussions about invariance.

Not covered in my response to Paul is the removal of assertions from a resource 
(which would undermine the continuing validity of inferences based on them).

In all the use-cases I had considered, I was expecting the provenance to be 
associated with a URI in such a way that it was potentially available for 
re-examination at a later date (subject to appropriate preservation).

The use-case you raise suggests a URI denoting a dynamic resource that returns a 
provenance snapshot; e.g. if there's one (fixed) URI for the weather in London 
right now, is there another (fixed) URI for the provenance of that weather 
report? And if there is, is the denoted resource strictly a *provenance* resource?

Either way, I expect that the assertions returned by dereferencing a provenance 
URI remain true (to the extent that they are a true reflection of some 
provenance).  This would imply that any resource URIs used by the provenance 
assertions would refer to suitable invariants;  e.g. to say that the report for 
http://example.org/LondonWeather/Now was prepared by http://example.org/Michael 
would probably be an incorrect provenance assertion (i.e. assuming different 
reports are prepared by different reporters).  Rather, such an assertion would 
need to be made about (e.g.) http://example.org/LondonWeather/2011-09-01T22:30 
to have enduring truth.

If we accept this invariance of truth of retrieved provenance assertions, the 
remaining question is about the extent to which we always get these assertions 
when dereferencing a provenance URI.  In general, on the Web, it's not possible 
to assert that some particular information will remain available indefinitely. 
So what breaks if we allow subsequent retrieval operations to return different 
(but non-conflicting) information.  I think the answer is that our ability to 
use the provenance information to verify the quality or trustworthiness of some 
particular information is undermined once some particular provenance information 
is no longer available.  I think that to specify requirements on preservation of 
provenance information is beyond the scope of this WG (Paul's point too) - 
applications need to specify these and deal with the consequences.

Which leads me to a conclusion which is not what I set out in my response to 
Paul.  I now offer:

(a) it may be OK to add true information
(b) it may be OK to remove information
(c) it would NOT be OK to *change* information (by which I mean to add 
information that is inconsistent/unsatisfiable with previously published 
information)

> - Second, in this WG, we have not defined yet (and I think we should) what we
> mean by obtaining the provenance of something. Does it include all the uses
> of that thing? I guess that provenance includes the backward graph, but does it
> also include
> the forward graph?

> If the latter, (and this is not unreasonable), than provenance of a resource
> changes
> as the resource is being *used*.

I'm not completely sure what you mean by backward- and forward- graphs here.  If 
you mean information like things from which some object was derived, and also 
things derived from some object, then I agree it's not unreasonable.

I think my position outlined above (and even my earlier position in response to 
Paul) allows for this.

> So, I believe that the provenance of something can change.

Yes, but only in constrained ways.

> But really, I don't know what your notion of a provenance resource is. We could
> define it as
> static or dynamic (even if provenance of something can change). The implications
> for implementation are substantial though.

If you allow provenance to be completely dynamic, then I'd say the whole debate 
about invariants has been rather pointless.  I think that provenance is only 
useful to the extent that a provenance assertion continues to be true.

I then think our discussion becomes one of how the contextualization of an 
assertion is captured.  I am assuming (and arguing for) contextualization 
through use of URIs of IVPs, so that true provenance assertions remain true. 
You may be expecting the contextualization to be provided through scoping 
mechanisms such as RDF named graphs.  A problem with the latter approach today 
is that we have no accepted formal semantics of RDF that allows us to capture 
the semantics (though there's academic work such as [1] and there have been 
proposals [2] [3] that might provide some steer, and I understand Pat Hayes has 
made a proposal to the current RDF WG [4]).

I note that these approaches (flat graph with URIs of IVPs, and named graphs 
with context) are not mutually exclusive - a dynamic-resource URI may become an 
IVP URI in a contextualized named graph, and lifting rules - e.g. per [1] - 
could be used to map between them.

I do think we should offer developers an option that does not depend on named 
graphs and extended RDF semantics. I suspect that widespread availability of 
tooling to effectively support useful inference across contextualized named 
graphs may be a while coming, assuming that the RDF WG comes up with a model 
that is usable for our purposes. (But I note all this would probably suggest 
some requirements to feed back to the RDF WG).

[1] http://www-formal.stanford.edu/guha/guha-thesis.ps

[2] http://www2005.org/cdrom/docs/p613.pdf

[3] http://dx.doi.org/10.1007/978-3-540-30475-3_4, 
http://citeseer.ist.psu.edu/viewdoc/download?doi=10.1.1.58.2368&rep=rep1&type=pdf

[4] http://www.w3.org/2011/rdf-wg/wiki/TF-Graphs/RDF-Datasets-Proposal#Semantics

#g
--



> If a provenance resource is not supposed to change over time, the provenance-uri
> points to a "frozen provenance",
> for a given resource in a given state. To implement this, you would require
> provenance services
> to have a form of "roll back" to previous states, to return the provenance as it
> existed in previous
> points in time. Do we really want that?
>
>
> Luc
>
>
> On 26/08/11 08:48, Graham Klyne wrote:
>> On 26/08/2011 04:51, Paul Groth wrote:
>>> Hi Graham,
>>>
>>> I think the important thing is that we don't say anything about how provenance
>>> information must be maintained. That is to say that the provenance information
>>> referred to by a provenance-uri may change over time.
>>>
>>> If I add some more detailed provenance information about a resource, I can still
>>> use the same provenance-uri.
>>>
>>> Is that correct?
>>
>> Good question.
>>
>> I think the important feature of provenance information (hence provenance
>> resources) possibly even a defining feature, is that if it is ever true, it is
>> always true.
>>
>> Also, valid inferences based on information from provenance resource must
>> remain valid indefinitely.
>>
>> I think the implications of this are:
>> (a) it may be OK to add true information
>> (b) it would not be OK to remove information
>>
>> But I think the whole issue of explicitly allowing a provenance resource to
>> vary may turn out to be a rathole. I'd rather focus on the essential
>> properties and let the rest sort itself out.
>>
>> #g
>> --
>>
>>> Graham Klyne wrote:
>>>>
>>>> On 23/08/2011 12:05, Provenance Working Group Issue Tracker wrote:
>>>>> PROV-ISSUE-79 (provenance-uri-contract): what is the contract associated with
>>>>> provenance-uris [Accessing and Querying Provenance]
>>>>>
>>>>> http://www.w3.org/2011/prov/track/issues/79
>>>>>
>>>>> Raised by: Luc Moreau
>>>>> On product: Accessing and Querying Provenance
>>>>>
>>>>> The PAQ document indicates that provenance information (sometimes referred to
>>>>> as provenance resource) may change over time.
>>>>
>>>> Where does it say that? If it does, I think it's a mistake.
>>>>
>>>> What's the implication for the provenance-uri? Is the provenance-uri a cool
>>>> URI? I think it is not, but this should be made explicit. There are also
>>>> further issues.
>>>>> Generally, what is the "contract" associated with this provenance-URI? How
>>>>> long should the server be able to serve this URI? It's particularly important
>>>>> for dynamically generated pages.
>>>>
>>>> IMO, any contact for longevity of retrievability of the resource is outside
>>>> scope of the spec. We can'#t mandate indefinite availability, and nothing else
>>>> would make any sense.
>>>>
>>>> #g
>>>> --
>>>>
>>>>> Let us consider a provenance store, in which provenance assertions gets
>>>>> accumulated. Let us consider a static resource, r, but over time, what we
>>>>> know about r changes, so it may have different provenance information p1
>>>>> and p2.
>>>>>
>>>>> If r is accessed, and a provenance-uri is returned, and dereferenced a first
>>>>> time, we obtain p1.
>>>>>
>>>>> If r is accessed again, are we expecting to get the same provenance-uri, or a
>>>>> different one if provenance has changed?
>>>>>
>>>>> Now, let us consider r as a dynamic resource. If r changes between the first
>>>>> and second access, is the same provenance-uri supposed to be returned?
>>>>> If it does not change, how do we ever have a guarantee that the provenance
>>>>> information obtained corresponds to the resource representation we obtained?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>
Received on Thursday, 1 September 2011 11:47:53 UTC