Re: On provenance access and web architecture from Graham Klyne on 2011-05-20 (public-prov-wg@w3.org from May 2011)

From: Graham Klyne <GK@ninebynine.org>
Date: Fri, 20 May 2011 17:00:19 +0100
To: Luc Moreau <L.Moreau@ecs.soton.ac.uk>
CC: public-prov-wg@w3.org
Message-ID: <4DD69013.8010808@ninebynine.org>
Luc Moreau wrote:
> Hi Graham,
> 
> You are raising lots of good questions.
> 
> But is it the case that provenance should always by accessible by HTTP? 
> Can't it be
> embedded in a document itself?  These are issues that the xg report was 
> trying to discuss.

Indeed.  There are two points you raise here, both valid:

1. Access by HTTP or some other mechanism?  Sure.  I talk about Web 
dereferencing, which allows HTTP or other mechanisms (though the Suggested 
POWDER mechanism for provenance discovery is HTTP-specific).

2. Separate resource and provenance, or embedded?  Both are possible. I think 
the separate case is a MUST, but my resource case (4) acknowledges the 
possibility of embedding.  I just don't discuss mechanisms for embedding.

> It would be good to revisit them, even if it is to agree that we are 
> always to download provenance
> from a URL.

I don't mean to ignore them, and I don't think my suggestions are fundamentally 
at odds with the XG report.  But my approach was trying to start from simple 
(dare I say "obvious"?) web mechanisms that I don't think we can avoid, then to 
suggest that some of the other cases might be addressed in response to 
identified problems.

I'm just trying to suggest a very simple starting point that we can use to get 
something concrete on the table.

I do think the discussion of "6.3 Provenance Passing" doesn't really address the 
practical advantages and disadvantages of the different approaches, and seems to 
make unstated assumptions about the nature of provenance information (e.g. by 
giving considerable weight to rapidly-changing dynamic resources, which I think 
at this stage are an edge case).  To my mind, it has more of an application 
design perspective than an interoperability specification perspective, and as a 
standards group I think our primary concern needs to be the latter.

For another example, "6.5 Provenance Negotiation" suggests an additional 
dimension of content negotiation.  But the implied problem here (too much 
provenance information) might be adequately addressed by a provenance query 
facility, which is already noted in the WG charter.  If we don't need to extend 
HTTP, we shouldn't do it.  We need to better understand the problem before 
considering such an approach - and if such an extension is needed, we'd be 
better placed to choose one that is most easily deployed in the current Web. 
(Deployability is another key concern in standards development work.)

To re-iterate, I don't think the XG report is wrong, I just think there may be a 
simpler, quicker approach to the issue, upon which we can build as requirements 
are clarified.

#g
--

> The xg final report had to be limited in size. Regarding architecture, 
> we made a presentation
> which I have uploaded on the wiki [1].  It may clarify a few things.
> 
> Cheers,
> Luc
> 
> 
> [1] http://www.w3.org/2011/prov/wiki/File:Xg-architecture.pdf
> 
> 
> 
> On 05/19/2011 05:45 PM, Graham Klyne wrote:
>> I've been thinking a bit about our discussions about using the 
>> provenance XG final report, specifically section 6 
>> (http://www.w3.org/2005/Incubator/prov/XGR-prov-20101214/#Provenance_in_Web_Architecture). 
>>
>>
>> In this message, I hope to stand back a little and sketch an initial 
>> approach to how provenance access can be addressed within the web 
>> architecture.  I don't claim to offer a complete solution, but it is 
>> one that I suspect will be sufficient in a large number of cases.  
>> Rather than starting from the provenance XG final report, I start from 
>> this working group's charter and the published architecture of the 
>> world wide web.  This does not necessarily incompatible with the 
>> direction suggested by provenance XG final report, but I think the 
>> emphasis and perspective may be rather different.
>>
>> ...
>>
>> The charter for this WG says:
>>
>> "... specifies how provenance can be accessed or queried in embedded 
>> documents and from remote services. Specifically, it defines how to 
>> access provenance embedded in an HTML document using RDFa, how to 
>> access provenance from a service by means of HTTP, and how to query 
>> provenance through a SPARQL endpoint."
>> -- http://www.w3.org/2011/01/prov-wg-charter
>>
>> To my mind, the starting point for accessing provenance information on 
>> the web should be simple:  just use HTTP.  The remaining issues, then, 
>> are (a) how to know that provenance information is available, and (b) 
>> what URI to use to retrieve it.  And POWDER seems to address these 
>> concerns.  The charter suggests some variations/extensions of this 
>> idea, but I'd like to focus first on the simple case.
>>
>> I take http://www.w3.org/TR/webarch/ (AWWW) as my starting point for a 
>> description of web architecture.  Right at the start (section 2), this 
>> document addresses identification, and asserts "To benefit from and 
>> increase the value of the World Wide Web, agents should provide URIs 
>> as identifiers for resources."
>>
>> ...
>>
>> So I think that one of the first questions to ask concerning how 
>> provenance access works within the web architecture is:
>>
>> "What resources do we recognize and identify with URIs?"
>>
>> My answer would start with:
>> (1) resources about which we wish to assert provenance information
>> (2) resources that are (contain?) provenance information about other 
>> resources (to be useful, we would generally assume these are 
>> dereferenceable on the web).
>>
>> The charter also suggests "provenance ... in embedded documents ... 
>> specifically ... RDFa".  I think this needs clarification, but suggests:
>> (3) resources that contain both textual information and provenance 
>> information
>>
>> Thinking about embedded provenance also suggests:
>> (4) resources that contain a resource (state) representation *and* 
>> provenance about that resource
>>
>> The discussion  that follows makes no assumptions about the data 
>> format of resource or provenance data used (cf. AWWW section 5.1).
>>
>> ...
>>
>> Next question: "How can provenance information be accessed"?
>>
>> Having identified resources and URIs, I think the initial mechanism 
>> for retrieving information given the corresponding URI is simple:  
>> just use web retrieval mechanisms.   This (and more) is discussed in 
>> AWWW section 3.
>>
>> Specifically, given the URI of a provenance resource, a simple 
>> mechanism is to use HTTP to retrieve a representation of that provenance.
>>
>> ...
>>
>> The next question I then see is:  "given some resource URI, how do I 
>> discover if there is provenance information associated with the 
>> resource, and what URI can I use to retrieve that provenance 
>> information?"
>>
>> Looking for existing, established web protocols, we can see that 
>> POWDER (http://www.w3.org/TR/powder-dr/#assoc-linking) proposes a 
>> number of possible mechanisms.  Many of these mechanisms are 
>> format-dependent, so may not be applicable in all cases, but the HTTP 
>> Link element could be used for any resource for which we have a 
>> dereferenceable URI.  Registering a link-rel type for provenance would 
>> provide a way to signal the availability and mURI of an associated 
>> provenance resource.
>>
>> (Another solution based on existing specifications might use WebDAV, 
>> but this is a less obvious fit, and requires a greater degree of 
>> server- and client- side support to deploy.  There may be other 
>> existing standards that could be used: ideas and suggestions are 
>> welcome.)
>>
>> ...
>>
>> The above discussion suggests a minimal set of mechanisms for 
>> provenance discovery and retrieval that are firmly rooted in Web 
>> architecture and existing standards. It is easy to imagine further 
>> situations for which these are insufficient, but to my mind they 
>> represent a (hopefully) non-controversial starting point. I think 
>> anything beyond this needs to be in response to a clearly articulated 
>> problem statement that cannot be adequately addressed using these 
>> basic mechanisms.
>>
>> #g
>> -- 
>>
>>
>
Received on Friday, 20 May 2011 21:56:38 UTC