Re: The Provenance Spectrum....

From: Michel Dumontier <michel.dumontier@gmail.com>
Date: Thu, 22 Sep 2011 23:26:48 +0200
Message-ID: <CALcEXf6jm9=fEK+T3M39SHj7HJ=Ycqbte2zHwhcGM1tdjU_2nA@mail.gmail.com>
To: Satya Sahoo <sahoo.2@wright.edu>
Cc: Joanne Luciano <jluciano@gmail.com>, public-semweb-lifesci hcls <public-semweb-lifesci@w3.org>
As a testament to the growing recognition of provenance for (e-)science, i'm
glad to see that the incubator group worked hard to think about the issues
and record them.

a good starting point:

"provenance is often represented as metadata, but not all metadata is
necessarily provenance"

"Descriptive metadata of a resource only becomes part of its provenance when
one also specifies its relationship to deriving the resource."

does not provide adequate description for identifying the conditions.

"Provenance of a resource is a record that describes entities and processes
involved in producing and delivering or otherwise influencing that resource"

contains elements that are undefined (record), uncertain (are processes not
also entities?), narrow (producing/delivering) and broad (influencing).

Of course, I appreciate the difficulty in crafting a good definition, and I
understand that this is a definition from which useful work can be
achieved.  I will take the opportunity to express my thoughts on the matter.

i think there are two key aspects to provenance (not unlike what is
suggested here: http://www.springerlink.com/content/edf0k68ccw3a22hu/)
1. how did the resource come about? (relates to creation and justification)
 -> important for reproducibility (which is an element of science)
 -> includes attribution (who created the resource), creation (process that
generated the resource), reproduction (process in which a copy was
made), derivation (process in which the resource was generated from some
resource or portion of a resource), versioning (process of keeping count of
sequential derivations)

2. what is the history of the resource (from the point of creation)
 -> important for authenticity
 -> includes origin, possession and the acts of transfer

Both have implications for trust, and can be used for accountability, among
other things.

I find this part on recommendations of a provenance framework quite nice:

but get less excited when i see the collection of "provenance concepts"

particularly because we need to simply the discourse such that we consider

an event (for 1 above)
 - participants (and their roles; e.g. agents, targets, products)
 - locations
 - time instants (e.g. action timestamps) and durations (processual

and a sequence of events (for both 1 and 2 above)

this would certainly help to generate a specification with a minimal set of
classes and relations to express this kind of information.

now, i'm writing this late at night, and I appreciate that I may not have
considered all the issues that the provenance group has (along with others
that have written about the subject), but perhaps there is still some good
discussions to be had wrt provenance and how we formally represent it, as it
is of strategic importance to the HCLSIG in our current and future efforts.



Michel Dumontier
Associate Professor of Bioinformatics
Carleton University
