W3C home > Mailing lists > Public > public-semweb-lifesci@w3.org > October 2011

Re: The Provenance Spectrum....

From: Helena Deus <helenadeus@gmail.com>
Date: Sat, 1 Oct 2011 12:19:35 +0100
Message-ID: <CAPkJ_9mUU87JoysqB0xc_dqxSi=5v131GTMYWCvOY4YgrHi9gw@mail.gmail.com>
To: Michel Dumontier <michel.dumontier@gmail.com>
Cc: Satya Sahoo <sahoo.2@wright.edu>, Joanne Luciano <jluciano@gmail.com>, public-semweb-lifesci hcls <public-semweb-lifesci@w3.org>
Hello Michel,

Latest work of the provenance wg is here -
http://dvcs.w3.org/hg/prov/raw-file/tip/model/ProvenanceModel.html
Where both the model of provenance and how provenance assertions can be
created are graphically illustrated. My role in the prov wg is to make sure
that there is a smooth transition from the model to its applicability in
realistic scenarios, many of which come from life sciences domains.

Kind regards,
Helena

On Thu, Sep 22, 2011 at 10:26 PM, Michel Dumontier <
michel.dumontier@gmail.com> wrote:

> As a testament to the growing recognition of provenance for (e-)science,
> i'm glad to see that the incubator group worked hard to think about the
> issues and record them.
>
> a good starting point:
>
> "provenance is often represented as metadata, but not all metadata is
> necessarily provenance"
>
> http://www.w3.org/2005/Incubator/prov/XGR-prov-20101214/#Provenance_and_Metadata
>
> but
> "Descriptive metadata of a resource only becomes part of its provenance
> when one also specifies its relationship to deriving the resource."
>
> does not provide adequate description for identifying the conditions.
>
> and
> "Provenance of a resource is a record that describes entities and
> processes involved in producing and delivering or otherwise influencing that
> resource"
>
> contains elements that are undefined (record), uncertain (are processes not
> also entities?), narrow (producing/delivering) and broad (influencing).
>
> Of course, I appreciate the difficulty in crafting a good definition, and I
> understand that this is a definition from which useful work can be
> achieved.  I will take the opportunity to express my thoughts on the matter.
>
> i think there are two key aspects to provenance (not unlike what is
> suggested here: http://www.springerlink.com/content/edf0k68ccw3a22hu/)
> 1. how did the resource come about? (relates to creation and justification)
>  -> important for reproducibility (which is an element of science)
>  -> includes attribution (who created the resource), creation (process that
> generated the resource), reproduction (process in which a copy was
> made), derivation (process in which the resource was generated from some
> resource or portion of a resource), versioning (process of keeping count of
> sequential derivations)
>
> 2. what is the history of the resource (from the point of creation)
>  -> important for authenticity
>  -> includes origin, possession and the acts of transfer
>
> Both have implications for trust, and can be used for accountability, among
> other things.
>
> I find this part on recommendations of a provenance framework quite nice:
>
> http://www.w3.org/2005/Incubator/prov/XGR-prov-20101214/#A_Roadmap_for_Provenance_on_the_Web
>
> but get less excited when i see the collection of "provenance concepts"
> http://www.w3.org/2005/Incubator/prov/XGR-prov-20101214/#Recommendations (section
> 4)
>
> particularly because we need to simply the discourse such that we consider
>
> an event (for 1 above)
>  - participants (and their roles; e.g. agents, targets, products)
>  - locations
>  - time instants (e.g. action timestamps) and durations (processual
> attributes)
>
> and a sequence of events (for both 1 and 2 above)
>
> this would certainly help to generate a specification with a minimal set of
> classes and relations to express this kind of information.
>
> now, i'm writing this late at night, and I appreciate that I may not have
> considered all the issues that the provenance group has (along with others
> that have written about the subject), but perhaps there is still some good
> discussions to be had wrt provenance and how we formally represent it, as it
> is of strategic importance to the HCLSIG in our current and future efforts.
>
> Best,
>
> m.
>
> On Thu, Sep 22, 2011 at 7:13 PM, Satya Sahoo <sahoo.2@wright.edu> wrote:
>
>> Hi Joanne and Scott,
>> In the Provenance incubator we agreed on the following definition:
>> Provenance of a resource is a record that describes entities and processes
>> involved in producing and delivering or otherwise influencing that resource.
>> Provenance provides a critical foundation for assessing authenticity,
>> enabling trust, and allowing reproducibility. Provenance assertions are a
>> form of contextual metadata and can themselves become important records with
>> their own provenance.
>>
>>
>>
>> A couple of key points in the above definition that will hopefully help to
>> "draw the line in the definition" - metadata, record past events (the use
>> of temporal dimension is critical to definition of provenance and has
>> consensus in the current provenance WG also [1]), foundation for trust
>> and reproducibility (these are often confused to be synonymous with
>> provenance but are actually derived from or are use of provenance), and
>> contextual or in other words each domain/application defines its own set
>> of provenance terms.
>>
>>
>> Hope this helps.
>>
>>
>> Best,
>> Satya Sahoo
>> http://cci.case.edu/cci/index.php/Satya_Sahoo
>>
>>
>> [1] http://www.w3.org/2011/prov/wiki/Main_Page
>>
>>
>> ----- Original Message -----
>> From: Joanne Luciano <jluciano@gmail.com>
>> Date: Thursday, September 22, 2011 12:34 pm
>> Subject: The Provenance Spectrum....
>> To: public-semweb-lifesci hcls <public-semweb-lifesci@w3.org>
>>
>>
>> > Thank you Scott for suggestion that we move the discussion to the
>> mailing list... and to include the provenance working group.
>> >
>>
>> > What is provenance? Where do we draw the line in the definition?
>> >
>>
>> > Our HCLSIG TMO discussion today was reminiscent of the 'what is an
>> ontology discussion?'.  And this is good discussion (that we just had on the
>> HCLSIG call).  I would like to settle the argument of what is provenance and
>> what isn't by suggesting (and claiming, if no one else has already claimed,
>> and if so, then agreeing) that there is a Provenance Spectrum and
>> inviting my esteemed colleagues to fill in the provenance spectrum ... and
>> let's create a nice graphic to go with it that we can all use.
>> >
>>
>> > Let's add UTILITY, if we can, so as we move across the spectrum, we get
>> more out of including more into the provenance definition.  I noticed that
>> many of us have spent a lot of time creating metadata terms and standards to
>> address the problems with legacy data for the purpose of integration, for
>> example, but that if these metadata are included as "provenance" then many
>> questions become easier to answer.  I learned this when I went to the IPAW
>> conference last June (2010) at RPI.  I suggest people check the papers
>> there. I was impressed.
>> >
>>
>> > And to Invite the Provenance working group to the discussion (or join
>> their discussion).
>> >
>>
>> > Cheers,
>> > Joanne
>> >
>>
>>
>> > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>  > Joanne S. Luciano, PhD                            Rensselaer
>> Polytechnic Institute
>> > Research Associate Professor                 110 8th Street, Winslow
>> 2143
>> > Tetherless World Constellation                Troy, NY 12180, USA
>> > Deputy Director, WebScience             Email: jluciano@cs.rpi.edu
>> > Office Tel. +1.518.276.4939                         Global Tel.
>> +1.617.440.4364 (skypeIn)
>> > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>  >
>>
>>
>
>
> --
> Michel Dumontier
> Associate Professor of Bioinformatics
> Carleton University
> http://dumontierlab.com
>



-- 
Helena F. Deus
Post-Doctoral Researcher at DERI/NUIG
http://lenadeus.info/
Received on Saturday, 1 October 2011 11:20:23 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 18:01:03 GMT