Re: On provenance access and web architecture

Hi Graham,

You are raising lots of good questions.

But is it the case that provenance should always by accessible by HTTP? 
Can't it be
embedded in a document itself?  These are issues that the xg report was 
trying to discuss.

It would be good to revisit them, even if it is to agree that we are 
always to download provenance
from a URL.

The xg final report had to be limited in size. Regarding architecture, 
we made a presentation
which I have uploaded on the wiki [1].  It may clarify a few things.

Cheers,
Luc


[1] http://www.w3.org/2011/prov/wiki/File:Xg-architecture.pdf



On 05/19/2011 05:45 PM, Graham Klyne wrote:
> I've been thinking a bit about our discussions about using the 
> provenance XG final report, specifically section 6 
> (http://www.w3.org/2005/Incubator/prov/XGR-prov-20101214/#Provenance_in_Web_Architecture). 
>
>
> In this message, I hope to stand back a little and sketch an initial 
> approach to how provenance access can be addressed within the web 
> architecture.  I don't claim to offer a complete solution, but it is 
> one that I suspect will be sufficient in a large number of cases.  
> Rather than starting from the provenance XG final report, I start from 
> this working group's charter and the published architecture of the 
> world wide web.  This does not necessarily incompatible with the 
> direction suggested by provenance XG final report, but I think the 
> emphasis and perspective may be rather different.
>
> ...
>
> The charter for this WG says:
>
> "... specifies how provenance can be accessed or queried in embedded 
> documents and from remote services. Specifically, it defines how to 
> access provenance embedded in an HTML document using RDFa, how to 
> access provenance from a service by means of HTTP, and how to query 
> provenance through a SPARQL endpoint."
> -- http://www.w3.org/2011/01/prov-wg-charter
>
> To my mind, the starting point for accessing provenance information on 
> the web should be simple:  just use HTTP.  The remaining issues, then, 
> are (a) how to know that provenance information is available, and (b) 
> what URI to use to retrieve it.  And POWDER seems to address these 
> concerns.  The charter suggests some variations/extensions of this 
> idea, but I'd like to focus first on the simple case.
>
> I take http://www.w3.org/TR/webarch/ (AWWW) as my starting point for a 
> description of web architecture.  Right at the start (section 2), this 
> document addresses identification, and asserts "To benefit from and 
> increase the value of the World Wide Web, agents should provide URIs 
> as identifiers for resources."
>
> ...
>
> So I think that one of the first questions to ask concerning how 
> provenance access works within the web architecture is:
>
> "What resources do we recognize and identify with URIs?"
>
> My answer would start with:
> (1) resources about which we wish to assert provenance information
> (2) resources that are (contain?) provenance information about other 
> resources (to be useful, we would generally assume these are 
> dereferenceable on the web).
>
> The charter also suggests "provenance ... in embedded documents ... 
> specifically ... RDFa".  I think this needs clarification, but suggests:
> (3) resources that contain both textual information and provenance 
> information
>
> Thinking about embedded provenance also suggests:
> (4) resources that contain a resource (state) representation *and* 
> provenance about that resource
>
> The discussion  that follows makes no assumptions about the data 
> format of resource or provenance data used (cf. AWWW section 5.1).
>
> ...
>
> Next question: "How can provenance information be accessed"?
>
> Having identified resources and URIs, I think the initial mechanism 
> for retrieving information given the corresponding URI is simple:  
> just use web retrieval mechanisms.   This (and more) is discussed in 
> AWWW section 3.
>
> Specifically, given the URI of a provenance resource, a simple 
> mechanism is to use HTTP to retrieve a representation of that provenance.
>
> ...
>
> The next question I then see is:  "given some resource URI, how do I 
> discover if there is provenance information associated with the 
> resource, and what URI can I use to retrieve that provenance 
> information?"
>
> Looking for existing, established web protocols, we can see that 
> POWDER (http://www.w3.org/TR/powder-dr/#assoc-linking) proposes a 
> number of possible mechanisms.  Many of these mechanisms are 
> format-dependent, so may not be applicable in all cases, but the HTTP 
> Link element could be used for any resource for which we have a 
> dereferenceable URI.  Registering a link-rel type for provenance would 
> provide a way to signal the availability and mURI of an associated 
> provenance resource.
>
> (Another solution based on existing specifications might use WebDAV, 
> but this is a less obvious fit, and requires a greater degree of 
> server- and client- side support to deploy.  There may be other 
> existing standards that could be used: ideas and suggestions are 
> welcome.)
>
> ...
>
> The above discussion suggests a minimal set of mechanisms for 
> provenance discovery and retrieval that are firmly rooted in Web 
> architecture and existing standards. It is easy to imagine further 
> situations for which these are insufficient, but to my mind they 
> represent a (hopefully) non-controversial starting point. I think 
> anything beyond this needs to be in response to a clearly articulated 
> problem statement that cannot be adequately addressed using these 
> basic mechanisms.
>
> #g
> -- 
>
>

-- 
Professor Luc Moreau
Electronics and Computer Science   tel:   +44 23 8059 4487
University of Southampton          fax:   +44 23 8059 2865
Southampton SO17 1BJ               email: l.moreau@ecs.soton.ac.uk
United Kingdom                     http://www.ecs.soton.ac.uk/~lavm

Received on Friday, 20 May 2011 11:59:26 UTC