W3C home > Mailing lists > Public > public-semweb-lifesci@w3.org > September 2011

Re: The Provenance Spectrum....

From: Michel Dumontier <michel.dumontier@gmail.com>
Date: Thu, 22 Sep 2011 23:26:48 +0200
Message-ID: <CALcEXf6jm9=fEK+T3M39SHj7HJ=Ycqbte2zHwhcGM1tdjU_2nA@mail.gmail.com>
To: Satya Sahoo <sahoo.2@wright.edu>
Cc: Joanne Luciano <jluciano@gmail.com>, public-semweb-lifesci hcls <public-semweb-lifesci@w3.org>
As a testament to the growing recognition of provenance for (e-)science, i'm
glad to see that the incubator group worked hard to think about the issues
and record them.

a good starting point:

"provenance is often represented as metadata, but not all metadata is
necessarily provenance"

"Descriptive metadata of a resource only becomes part of its provenance when
one also specifies its relationship to deriving the resource."

does not provide adequate description for identifying the conditions.

"Provenance of a resource is a record that describes entities and processes
involved in producing and delivering or otherwise influencing that resource"

contains elements that are undefined (record), uncertain (are processes not
also entities?), narrow (producing/delivering) and broad (influencing).

Of course, I appreciate the difficulty in crafting a good definition, and I
understand that this is a definition from which useful work can be
achieved.  I will take the opportunity to express my thoughts on the matter.

i think there are two key aspects to provenance (not unlike what is
suggested here: http://www.springerlink.com/content/edf0k68ccw3a22hu/)
1. how did the resource come about? (relates to creation and justification)
 -> important for reproducibility (which is an element of science)
 -> includes attribution (who created the resource), creation (process that
generated the resource), reproduction (process in which a copy was
made), derivation (process in which the resource was generated from some
resource or portion of a resource), versioning (process of keeping count of
sequential derivations)

2. what is the history of the resource (from the point of creation)
 -> important for authenticity
 -> includes origin, possession and the acts of transfer

Both have implications for trust, and can be used for accountability, among
other things.

I find this part on recommendations of a provenance framework quite nice:

but get less excited when i see the collection of "provenance concepts"

particularly because we need to simply the discourse such that we consider

an event (for 1 above)
 - participants (and their roles; e.g. agents, targets, products)
 - locations
 - time instants (e.g. action timestamps) and durations (processual

and a sequence of events (for both 1 and 2 above)

this would certainly help to generate a specification with a minimal set of
classes and relations to express this kind of information.

now, i'm writing this late at night, and I appreciate that I may not have
considered all the issues that the provenance group has (along with others
that have written about the subject), but perhaps there is still some good
discussions to be had wrt provenance and how we formally represent it, as it
is of strategic importance to the HCLSIG in our current and future efforts.



On Thu, Sep 22, 2011 at 7:13 PM, Satya Sahoo <sahoo.2@wright.edu> wrote:

> Hi Joanne and Scott,
> In the Provenance incubator we agreed on the following definition:
> Provenance of a resource is a record that describes entities and processes
> involved in producing and delivering or otherwise influencing that resource.
> Provenance provides a critical foundation for assessing authenticity,
> enabling trust, and allowing reproducibility. Provenance assertions are a
> form of contextual metadata and can themselves become important records with
> their own provenance.
> A couple of key points in the above definition that will hopefully help to
> "draw the line in the definition" - metadata, record past events (the use
> of temporal dimension is critical to definition of provenance and has
> consensus in the current provenance WG also [1]), foundation for trust and
> reproducibility (these are often confused to be synonymous with provenance
> but are actually derived from or are use of provenance), and contextual or
> in other words each domain/application defines its own set of provenance
> terms.
> Hope this helps.
> Best,
> Satya Sahoo
> http://cci.case.edu/cci/index.php/Satya_Sahoo
> [1] http://www.w3.org/2011/prov/wiki/Main_Page
> ----- Original Message -----
> From: Joanne Luciano <jluciano@gmail.com>
> Date: Thursday, September 22, 2011 12:34 pm
> Subject: The Provenance Spectrum....
> To: public-semweb-lifesci hcls <public-semweb-lifesci@w3.org>
> > Thank you Scott for suggestion that we move the discussion to the
> mailing list... and to include the provenance working group.
> >
> > What is provenance? Where do we draw the line in the definition?
> >
> > Our HCLSIG TMO discussion today was reminiscent of the 'what is an
> ontology discussion?'.  And this is good discussion (that we just had on the
> HCLSIG call).  I would like to settle the argument of what is provenance and
> what isn't by suggesting (and claiming, if no one else has already claimed,
> and if so, then agreeing) that there is a Provenance Spectrum and
> inviting my esteemed colleagues to fill in the provenance spectrum ... and
> let's create a nice graphic to go with it that we can all use.
> >
> > Let's add UTILITY, if we can, so as we move across the spectrum, we get
> more out of including more into the provenance definition.  I noticed that
> many of us have spent a lot of time creating metadata terms and standards to
> address the problems with legacy data for the purpose of integration, for
> example, but that if these metadata are included as "provenance" then many
> questions become easier to answer.  I learned this when I went to the IPAW
> conference last June (2010) at RPI.  I suggest people check the papers
> there. I was impressed.
> >
> > And to Invite the Provenance working group to the discussion (or join
> their discussion).
> >
> > Cheers,
> > Joanne
> >
> > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>  > Joanne S. Luciano, PhD                            Rensselaer
> Polytechnic Institute
> > Research Associate Professor                 110 8th Street, Winslow
> 2143
> > Tetherless World Constellation                Troy, NY 12180, USA
> > Deputy Director, WebScience             Email: jluciano@cs.rpi.edu
> > Office Tel. +1.518.276.4939                         Global Tel.
> +1.617.440.4364 (skypeIn)
> > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>  >

Michel Dumontier
Associate Professor of Bioinformatics
Carleton University
Received on Thursday, 22 September 2011 21:34:55 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 17:21:00 UTC