- From: James Cheney <jcheney@inf.ed.ac.uk>
- Date: Mon, 25 Oct 2010 17:03:59 +0100
- To: Olaf Hartig <hartig@informatik.hu-berlin.de>
- Cc: public-xg-prov@w3.org
On Oct 25, 2010, at 2:53 PM, Olaf Hartig wrote:
> On Monday 25 October 2010 13:32:10 Paolo Missier wrote:
>> Hi,
>> a couple of further comments on this thread:
>>
>> On 25/10/2010 07:41, Olaf Hartig wrote:
>>> Hey,
>>>
>>> On Sunday 24 October 2010 15:50:28 Paul Groth wrote:
>>>> Hi Olaf,
>>>>
>>>> Thanks for the comments. Really good. Some replies in-line
>>>> [...]
>>>> * You speak about "provenance of any web-resource". I still
>>>> struggle to
>>>> see
>>>
>>> how Web resources, in general, have provenance. To me provenance is
>>> associated primarily with specific representations of Web
>>> resources that
>>> we retrieve from the Web.
>>
>> why wouldn't resources have provenance?
>
> The problem is that a Web resource may change; it may have a
> different state at
> a different point in time. What would the provenance of such a
> changing thing
> be?
> A specific representation of a Web resource cannot change. That's
> why I find it
> much easier to talk about the provenance of such representations
> rather than
> the Web resource itself.
> That's probably also why artifacts in OPM are immutable pieces of
> state.
hi Olaf and others,
This seems like an important point. Some of the work on provenance
for data/databases (e.g. by me and Irini and others) is really about
recording the relationships between past versions (which I'd call
"dynamic" provenance), not just about describing process-step
derivation relationships between immutable artifacts ("static"
provenance).
By analogy, real-world artifacts (e.g. the Mona Lisa) can have
provenance (ownership, modification or preservation history) even
though they change - in fact, you can't stop physical artifacts from
changing (think radioactive decay), and the fact that such artifacts
can change over time in inessential ways while retaining "identity" is
part of what makes provenance information so important for
establishing authenticity. Knowing it's the same canvas painted by da
Vinci, and not a well-executed copy, is part of what makes it
valuable: we can learn things about da Vinci that we can't learn from
a copy.
The situation is complicated further by the fact that digital
artifacts can be copied "exactly". Thus, there may be many minor
variants of a data item floating around the Web, each having been
derived from an original source by a complicated, and currently
invisible, process. So the analogy with physical objects breaks down
a bit. But many Web resources (such as databases) have enough of the
attributes of physical stateful things that the analogy can still make
some sense.
I can imagine wanting to know the (dynamic) provenance, or history, of
a record in a database as part of understanding the static provenance
of a result obtained from the database at a given moment in time. In
particular, a long-running process might have accessed different
versions of a database that was updated during the run, leading to a
result that uses inconsistent data from the different versions.
I view the wg proposal as encouraging focus and standardization on the
static case, where there are several mature and broadly similar
proposals such as OPM, PML, Provenir, and others. There is currently
no broad consensus for representing fine-grained, dynamic provenance/
version information AFAIK (or for propagation of provenance through
database queries and updates), nor are there mature systems that do
this. This still seems like a research issue to me which would be
premature to try to standardize, and this discussion thread suggests
there may still be disagreement or confusion about basic concepts.
So one suggestion I was going to make was that in addition to
recommending both a WG to focus on standardizing a consensus exchange
format, we might propose an interest group or low-maintenance activity
to facilitate discussion and convergence on broader provenance issues,
such as dynamic provenance in databases or RDF stores, provenance
querying, etc., and revisiting the case for standards as these areas
mature (I'm not sure how one makes a case for this though).
This got a bit long-winded. Thoughts?
--James
--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
Received on Monday, 25 October 2010 16:05:08 UTC