- From: James Cheney <jcheney@inf.ed.ac.uk>
- Date: Mon, 25 Oct 2010 17:03:59 +0100
- To: Olaf Hartig <hartig@informatik.hu-berlin.de>
- Cc: public-xg-prov@w3.org
On Oct 25, 2010, at 2:53 PM, Olaf Hartig wrote: > On Monday 25 October 2010 13:32:10 Paolo Missier wrote: >> Hi, >> a couple of further comments on this thread: >> >> On 25/10/2010 07:41, Olaf Hartig wrote: >>> Hey, >>> >>> On Sunday 24 October 2010 15:50:28 Paul Groth wrote: >>>> Hi Olaf, >>>> >>>> Thanks for the comments. Really good. Some replies in-line >>>> [...] >>>> * You speak about "provenance of any web-resource". I still >>>> struggle to >>>> see >>> >>> how Web resources, in general, have provenance. To me provenance is >>> associated primarily with specific representations of Web >>> resources that >>> we retrieve from the Web. >> >> why wouldn't resources have provenance? > > The problem is that a Web resource may change; it may have a > different state at > a different point in time. What would the provenance of such a > changing thing > be? > A specific representation of a Web resource cannot change. That's > why I find it > much easier to talk about the provenance of such representations > rather than > the Web resource itself. > That's probably also why artifacts in OPM are immutable pieces of > state. hi Olaf and others, This seems like an important point. Some of the work on provenance for data/databases (e.g. by me and Irini and others) is really about recording the relationships between past versions (which I'd call "dynamic" provenance), not just about describing process-step derivation relationships between immutable artifacts ("static" provenance). By analogy, real-world artifacts (e.g. the Mona Lisa) can have provenance (ownership, modification or preservation history) even though they change - in fact, you can't stop physical artifacts from changing (think radioactive decay), and the fact that such artifacts can change over time in inessential ways while retaining "identity" is part of what makes provenance information so important for establishing authenticity. Knowing it's the same canvas painted by da Vinci, and not a well-executed copy, is part of what makes it valuable: we can learn things about da Vinci that we can't learn from a copy. The situation is complicated further by the fact that digital artifacts can be copied "exactly". Thus, there may be many minor variants of a data item floating around the Web, each having been derived from an original source by a complicated, and currently invisible, process. So the analogy with physical objects breaks down a bit. But many Web resources (such as databases) have enough of the attributes of physical stateful things that the analogy can still make some sense. I can imagine wanting to know the (dynamic) provenance, or history, of a record in a database as part of understanding the static provenance of a result obtained from the database at a given moment in time. In particular, a long-running process might have accessed different versions of a database that was updated during the run, leading to a result that uses inconsistent data from the different versions. I view the wg proposal as encouraging focus and standardization on the static case, where there are several mature and broadly similar proposals such as OPM, PML, Provenir, and others. There is currently no broad consensus for representing fine-grained, dynamic provenance/ version information AFAIK (or for propagation of provenance through database queries and updates), nor are there mature systems that do this. This still seems like a research issue to me which would be premature to try to standardize, and this discussion thread suggests there may still be disagreement or confusion about basic concepts. So one suggestion I was going to make was that in addition to recommending both a WG to focus on standardizing a consensus exchange format, we might propose an interest group or low-maintenance activity to facilitate discussion and convergence on broader provenance issues, such as dynamic provenance in databases or RDF stores, provenance querying, etc., and revisiting the case for standards as these areas mature (I'm not sure how one makes a case for this though). This got a bit long-winded. Thoughts? --James -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
Received on Monday, 25 October 2010 16:05:08 UTC