- From: Luc Moreau <L.Moreau@ecs.soton.ac.uk>
- Date: Wed, 25 May 2011 16:13:12 +0100
- To: Graham Klyne <GK@ninebynine.org>
- CC: public-prov-wg@w3.org
On 05/25/2011 02:54 PM, Graham Klyne wrote: > Luc Moreau wrote: >> Nothing in the example is restricted to rdf or triple stores. >> It also applies to a table in a relational database (and its xml >> serialization), >> or an excel spreadsheet (and a csv representation). > > Luc, > > You're right. When I made my previous comments, I was referring to > your illustration inspired by the example (repeated at the end of this > email). > > But I did go back and re-check the example at > http://www.w3.org/2011/prov/wiki/ProvenanceExample and in light of our > discussion I think I see a problem there (tangentially related to our > discussion of "containers"). > > I repeat here the first steps of the example for ease of reference: > [[ > government (gov) converts data (d1) to RDF (f1) at time (t1) > government (gov) generates provenance information (prov) regarding RDF > (f1) > government (gov) publishes RDF data (f1) along with its provenance > (prov) on a portal with a license (li1); the rdf data is now available > as a Web resource (r1) > analyst (alice) downloads a turtle serialization (lcp1) of the > resource (r1) from government portal > ]] > > Based on your comments, I think that "f1" is intended to be a local > (non-published) copy of the RDF data. As such, I'm not sure it makes > sense to generate and subsequently publish provenence "prov" about > "f1", because when "f1" is copied to a publication location and made > available as "r1", "prov" is still about the unpublished "f1". The > process of publication is part of the provenence of "r1", which is > absent from the provenance of "f1". > > And while this may seem like a discourse about the cardinality of a > set of angels dancing on a pinhead, I think there are some potentially > serious implications: > > Suppose there are two routes to publication that can be employed by > (gov) - e.g. two different employees who might handle the publication > process. And suppose one uses a PC and the other uses a Mac computer > to perform the publication process. Under certain circumstances, the > line endings of text files processed may be handled differently by > these different systems, possibly resulting in different published > content (r1). Here the outcome is likely benign. But suppose that it > is later discovered that the PC contains Malware that randomly > corrupts data that is being processed. Now it can become important to > know what systems were used to perform the publication, as that > effects the reliability of the published result. Surely, this MUST be > reflected in a complete provenance record, for any useful definition > of "complete"? > > The point is that (prov) calculated from (f1) is NOT the provenance of > (r1), but as stated the example publishes (prov) as if it IS the > provenance of (r1). > You're right. The example needs to be fixed. I think that the step "gov publishes prov" is the remit of the provenance access/query task force. It will have to decided how to do that. Luc > I have a hunch that once we get this bit right, handling of dynamic > resources may not need to appeal to the notation of a "container". > > (FWIW, where you appealed to l-values and r-values, I would look > towards a functional programming model where there are just values to > consider, and where each such value has a provenance. But such values > are not simply extensionally defined, but must in some sense take > account of the context in which they occur - as the above example > about (f1) and (r1) - as well as their specific content. I can > imagine that it is this notion of context which you see the container > supplying. But I think that to do so conflates the notions of context > and dynamic update.) > > #g > -- > > >>>> Illustration inspired by the example. >>>> >>>> - government (gov) converts data (d1) to RDF file (f1) at time (t1) >>>> using xlst transform >>>> - government (gov) uploads RDF data (f1) into a triple store, >>>> exposed as Web resource (r1) >>>> - analyst (alice) downloads a turtle serialization (lcp1) of the >>>> resource (r1) from government portal >>>> >>>> Illustrations: >>>> - r1: is a resource: it's the triple store, its a container, its >>>> content can vary over time >>>> - lcp1: is a r-text (turtle serialization) of a given snapshot >>>> (created by, or available at the time of, download) >>>> - f1 is a local file: it can be seen as a stateless anonymous >>>> resource, with a single r-text. >>>> >>>> If in addition: >>>> - analyst (alice) downloads a rdf/xml serialization (lcp2) of the >>>> resource (r1) >>>> >>>> If the content of r1 has not changed, then lcp2 and lcp1 are both >>>> r-texts of a same r-snapshot. >>>> >>>> Note that this is not limited to RDF (as Graham mentioned) >>>> >>>> - newspaper (news), uses a CMS to publish the incidence map (map1), >>>> chart (c1) and >>>> the image (img1) within a document (art1) written by (joe) using >>>> license (li2) >>>> - newspaper (news), updates art1, adding a correction following a >>>> complaint from a reader >>>> >>>> Illustrations: >>>> - art1 is a also resource, with two r-snapshots (before and after >>>> correction) >>>> - with language negotiation, an http client can download html and >>>> xhtml representations (i.e., r-texts) of the article > > -- Professor Luc Moreau Electronics and Computer Science tel: +44 23 8059 4487 University of Southampton fax: +44 23 8059 2865 Southampton SO17 1BJ email: l.moreau@ecs.soton.ac.uk United Kingdom http://www.ecs.soton.ac.uk/~lavm
Received on Wednesday, 25 May 2011 15:13:52 UTC