- From: Graham Klyne <GK@ninebynine.org>
- Date: Wed, 25 May 2011 14:54:51 +0100
- To: Luc Moreau <L.Moreau@ecs.soton.ac.uk>
- CC: public-prov-wg@w3.org
Luc Moreau wrote: > Nothing in the example is restricted to rdf or triple stores. > It also applies to a table in a relational database (and its xml > serialization), > or an excel spreadsheet (and a csv representation). Luc, You're right. When I made my previous comments, I was referring to your illustration inspired by the example (repeated at the end of this email). But I did go back and re-check the example at http://www.w3.org/2011/prov/wiki/ProvenanceExample and in light of our discussion I think I see a problem there (tangentially related to our discussion of "containers"). I repeat here the first steps of the example for ease of reference: [[ government (gov) converts data (d1) to RDF (f1) at time (t1) government (gov) generates provenance information (prov) regarding RDF (f1) government (gov) publishes RDF data (f1) along with its provenance (prov) on a portal with a license (li1); the rdf data is now available as a Web resource (r1) analyst (alice) downloads a turtle serialization (lcp1) of the resource (r1) from government portal ]] Based on your comments, I think that "f1" is intended to be a local (non-published) copy of the RDF data. As such, I'm not sure it makes sense to generate and subsequently publish provenence "prov" about "f1", because when "f1" is copied to a publication location and made available as "r1", "prov" is still about the unpublished "f1". The process of publication is part of the provenence of "r1", which is absent from the provenance of "f1". And while this may seem like a discourse about the cardinality of a set of angels dancing on a pinhead, I think there are some potentially serious implications: Suppose there are two routes to publication that can be employed by (gov) - e.g. two different employees who might handle the publication process. And suppose one uses a PC and the other uses a Mac computer to perform the publication process. Under certain circumstances, the line endings of text files processed may be handled differently by these different systems, possibly resulting in different published content (r1). Here the outcome is likely benign. But suppose that it is later discovered that the PC contains Malware that randomly corrupts data that is being processed. Now it can become important to know what systems were used to perform the publication, as that effects the reliability of the published result. Surely, this MUST be reflected in a complete provenance record, for any useful definition of "complete"? The point is that (prov) calculated from (f1) is NOT the provenance of (r1), but as stated the example publishes (prov) as if it IS the provenance of (r1). I have a hunch that once we get this bit right, handling of dynamic resources may not need to appeal to the notation of a "container". (FWIW, where you appealed to l-values and r-values, I would look towards a functional programming model where there are just values to consider, and where each such value has a provenance. But such values are not simply extensionally defined, but must in some sense take account of the context in which they occur - as the above example about (f1) and (r1) - as well as their specific content. I can imagine that it is this notion of context which you see the container supplying. But I think that to do so conflates the notions of context and dynamic update.) #g -- >>> Illustration inspired by the example. >>> >>> - government (gov) converts data (d1) to RDF file (f1) at time (t1) >>> using xlst transform >>> - government (gov) uploads RDF data (f1) into a triple store, exposed >>> as Web resource (r1) >>> - analyst (alice) downloads a turtle serialization (lcp1) of the >>> resource (r1) from government portal >>> >>> Illustrations: >>> - r1: is a resource: it's the triple store, its a container, its >>> content can vary over time >>> - lcp1: is a r-text (turtle serialization) of a given snapshot >>> (created by, or available at the time of, download) >>> - f1 is a local file: it can be seen as a stateless anonymous >>> resource, with a single r-text. >>> >>> If in addition: >>> - analyst (alice) downloads a rdf/xml serialization (lcp2) of the >>> resource (r1) >>> >>> If the content of r1 has not changed, then lcp2 and lcp1 are both >>> r-texts of a same r-snapshot. >>> >>> Note that this is not limited to RDF (as Graham mentioned) >>> >>> - newspaper (news), uses a CMS to publish the incidence map (map1), >>> chart (c1) and >>> the image (img1) within a document (art1) written by (joe) using >>> license (li2) >>> - newspaper (news), updates art1, adding a correction following a >>> complaint from a reader >>> >>> Illustrations: >>> - art1 is a also resource, with two r-snapshots (before and after >>> correction) >>> - with language negotiation, an http client can download html and >>> xhtml representations (i.e., r-texts) of the article
Received on Wednesday, 25 May 2011 14:52:19 UTC