W3C home > Mailing lists > Public > public-prov-wg@w3.org > May 2011

Re: concept illustrations for the data journalism example

From: Simon Miles <simon.miles@kcl.ac.uk>
Date: Wed, 18 May 2011 08:38:07 +0100
Message-ID: <BANLkTinq=ZBDa_t2uncbx1bGg_jGOXKvQQ@mail.gmail.com>
To: public-prov-wg@w3.org
Hi Martin, Graham,

Coming from the CompSci perspective, I am not clear why we would
regard raw observations as "terminal" in provenance at all (regardless
of whether what is raw can be precisely defined).

Wouldn't questions such as these require data which was part of the
provenance of observation data?
 - what led to the sensor which produced the observation being
calibrated as it was? e.g. prior experiments of a similar kind may
have produced good results with that calibration so we re-used it
 - how was the sensor which produced the observation manufactured?
e.g. if it was part of a bad batch, the documentation of the
manufacturing process may provide the reason for your strange ultimate
results
 - why was the sensor producing the observation positioned as it was?
e.g. a satellite's position may be influenced both by both documented
design decisions of multiple parties using it, reported political
agreements, in addition to physical world effects such as gravity and
momentum

The details are probably not too important for this WG, but it seems
somehow important that the model and querying do not assume that the
observations we have are not partly due to other data, else we exclude
a range of use cases.  If that is accepted, then wouldn't observations
be dealt with as with any other data with regards to provenance?

Thanks,
Simon

On 13 May 2011 08:04, Graham Klyne <graham.klyne@zoo.ox.ac.uk> wrote:
> Hi Martin,
>
> FWIW, one of the themes I have at the back of my mind is to assess how the
> emerging provenance model and vocabularies can be used with the CIDOC-CRM style
> of event-mediated structure, but not as something I think should be a prime
> concern of this working group ... at least not at this early stage.
>
> I've noticed that there seem to be some subtle differences of expectation around
> the term "provenance" between the CompSci "provenance" community (mostly
> represented here), and the other communities who use CRM and similar descriptive
> frameworks.  I haven't yet fully recognized the extent of the differences; it's
> something else I hope to tease out over time.
>
> It's a nice point you make about the "terminal" nature of raw observations,
> though I'm wondering how clear the dividing line will be.  Imagine things like
> the Large Hadron Collider experiments where what is being observed is a
> secondary effect of some primary event, the data from which is passed through
> several stages of detection and correlation/condensation hardware before getting
> close to being presented to a computer.  I think there's scope here to move the
> provenance terminal boundary according to one's current needs.
>
> #g
> --
>
>
> martin wrote:
>> We have provenance applications of data capturing and scientific
>> observation
>> in medicine, archaeology and even some in satellite data, and a major
>> application
>> in empirical and synthetic creation of 3D Models by various methods.
>>
>> I regard it as important to include and differentiate sufficiently the
>> event of
>> capturing digital data via devices in a physical environment.
>>
>> Whereas in Digital-to-Digital processing the environment and place of
>> the event
>> plays a neglectable role, the scientific or even forensic interpretation
>> of measured
>> data critically depends on environmental factors, and of analogue
>> characteristics of
>> involved devices. We may even regard the individual history of
>> calibration or
>> degradation of an individual device as relevant.
>>
>> It is not the challenge to register in the provenance data
>> all those factors explicitly per event, but to provide a few core data
>> that can lead the interested
>> user to find such details in other sources. For instance, a date-time,
>> place (geo-reference) and/or
>> the observed object (patient!) may already be sufficient, depending on
>> the case.
>> This would provide the interested scientist with enough clues to find
>> more data about the conditions at
>> that place and time from other sources.
>>
>> Another distinct feature of measurement or digitization events
>> ("acquisition")is that they are
>> by nature terminal in the provenance chain: They are the unique
>> transition from the real to
>> the digital, and all derivatives should include reference to this
>> ultimate source and
>> their circumstances. For instance, an image of Obama typically stays an
>> image of Obama
>> independently from the number of processing steps. So, the reference to
>> the object captured
>> in the initial event is a vital parameter for the whole chain to follow.
>> Querying an
>> provenance chain should ultimately stop at the acquisition event or the
>> invention of data, which may be due to an artistic process, a scientific
>> simulation process
>> or a technical planning process. Therefore reasoning on "terminal"
>> events differs from
>> that about general processing events.
>>
>> To make things as simple as possible, I propose to include the simple
>> taking of a news photo
>> in the journalism example. This is simple, but includes all necessary
>> features to discuss
>> the core concept.
>>
>> It needs not be as weird as :
>> http://www.cidoc-crm.org/crm_core/core_examples/hoagland.htm
>> or http://www.cidoc-crm.org/crm_core/core_examples/henrichsen.htm.....
>>
>> Best,
>>
>> Martin
>>
>
>
>
> ______________________________________________________________________
> This email has been scanned by the MessageLabs Email Security System.
> For more information please visit http://www.messagelabs.com/email
> ______________________________________________________________________
>



-- 
Dr Simon Miles
Lecturer, Department of Informatics
Kings College London, WC2R 2LS, UK
+44 (0)20 7848 1166
Received on Wednesday, 18 May 2011 07:38:35 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 26 April 2012 13:06:29 GMT