W3C home > Mailing lists > Public > public-sdw-wg@w3.org > May 2015

Re: UCR isssue: Is provenance in scope? [SEC=UNCLASSIFIED]

From: Bruce Bannerman <B.Bannerman@bom.gov.au>
Date: Thu, 21 May 2015 11:22:20 +0000
To: "Kerry.Taylor@csiro.au" <Kerry.Taylor@csiro.au>
CC: SDW WG <public-sdw-wg@w3.org>
Message-ID: <1432207339957.85163@bom.gov.au>
Hi Kerrie,

Thanks for agreeing to post this to the wiki for me, while I work out how to get registered for the working group.

I will not get time to do the second part of your request (i.e. the analysis) until next week.



Provenance of climate data (Best Practice, Time-series, Observations, Coverages, Data Provenance, DOI, Metadata)

Contributed by: Bruce Bannerman

This use case is inspired by one of the conclusions of the UK Parliamentary inquiry into 'ClimateGate'  http://www.publications.parliament.uk/pa/cm201011/cmselect/cmsctech/444/44404.htm.

"...It is not standard practice in climate science to publish the raw data and the computer code in academic papers. However, climate science is a matter of great importance and the quality of the science should be irreproachable. We therefore consider that climate scientists should take steps to make available all the data that support their work (including raw data) and full methodological workings (including the computer codes...)."

When a Climate Scientist publishes a paper, he needs to be able to refer reviewers to the source data and software source code that underpins the assertions made within the paper. Climate data is typically time-series and can be quite complex. Data can be sourced from a single National Meteorological and Hydrological Service (NMHS), or from a number of NMHS. Software source code is typically stored within a software revision control repository, such as git.

Climate data may comprise all of the following:

1.  A time-series of observations of a specific phenomena at a single sensor, including:

  *   estimations of the value of an observed property
  *   a detailed understanding of the conditions under which the observation was made
  *   any changes that have been made to the observation (e.g. due to Quality Assurance processes etc)

2.  A collection of the time-series observations described at 1, of the same phenomena, at the same time steps, perhaps in the form of a discrete coverage (time-series).

 3.  A time-series representing the distribution of values of the collection of time-series observations represented as perhaps a

  *   continuous two dimensional coverage, or
  *   continuous coverages as three dimensional cubes, or
  *   continuous coverage as n-dimensional models
  *   an ensemble, comprising a number of models
  *   where these time-series coverages /cubes / models represent the outputs of some analytical process.

So using the description of climate data above, when a paper is published, the scientist needs to be able to refer viewers to:

  *   the analytical data that underpins the paper (perhaps the n-dimensional continuous coverage time-series at #3).
  *   the quality assured observations data that the continuous coverages at #3 were derived from at #2 and #1 (with details as to why each change to the Quality Assured observations described at #1 were made).
  *   the ‘raw’ observations data described at #1, with details as to the conditions, sensors etc that the observation was made under.
  *   the version of the software source code that was used to manipulate the data at #1, #2, and #3.

This is not a trivial data management problem to address, however its resolution will provide a solid data management grounding for future climate science and help address much spurious debate.

Parts of the puzzle are currently being worked on, e.g.:

  *   The World Meteorological Organisation (WMO) have published WMO No. 1131, Climate Data Management Systems Specifications that provide a high level conceptual architecture to address much of the data management issues described above. See: http://library.wmo.int/opac/index.php?lvl=notice_display&id=16300#.VV260-ePJVx

  *   WMO and OGC have developed and are developing relevant logical data models ISO 19156 Observations and Measurements, OGC Timeseries, and WMO METCE. The last two are based on Observations and Measurements.

  *   WMO have released a standard for describing Observations Metadata, called WIGOS Metadata.

  *   WMO are about to start work on a logical data model based on Observations and Measurements and Timeseries for describing WMO Observations. A future iteration of this model will need to cater for data provenance.

  *   W3C have developed PROV-O, which has considerable potential for describing data provenance.

The missing part is:

How can the provenance of the collection of climate data and the software used to manipulate it be best modeled and in the future, found via the Internet?

The resolution of this issue will be of relevance to many domains.


From: Kerry.Taylor@csiro.au <Kerry.Taylor@csiro.au>
Sent: Wednesday, 20 May 2015 9:41 PM
To: Bruce Bannerman; jlieberman@tumblingwalls.com
Cc: public-sdw-wg@w3.org
Subject: RE: UCR isssue: Is provenance in scope? [SEC=UNCLASSIFIED]

Hi Bruce, Josh,

I, for one would love to see that use case! I will do what I can to hold the presses for you – can you get it on the wiki in the next 24 hours? https://www.w3.org/2015/spatial/wiki/Working_Use_Cases    And also do the analysis of requirements in the spreadsheet https://docs.google.com/spreadsheets/d/1PSnpJYQDgsdgZgPJEfUU0EhVfgFFYGc1WL4xUX9Dunk/edit?usp=sharing

I have done a lot of work on provenance in the context of Bioregional assessments and other things with GA.
I also was part of that work in publishing BoM’s  ACORN-SAT  as linked data --  and it would have been lovely to do that with provenance too.

However, I do not think we are going to be “doing”  provenance in this group, I would just like to know that what we are doing neatly docks to PROV-O (the W3C prov ontology),
and I know that  will not be the case unless we make it so.  See for example http://knoesis.org/ssn2014/paper_9.pdf. It would be great, too, if  Josh is watching out for
“reference provenance of spatial data must address not only how a feature and a spatial such as a geometry were formed, but how they were associated and under what assumptions for representation of the physical world.”
so that we can have some confidence that it will be possible to represent this--- but I still don’t see the doing of that as in scope (wrt our  charter). We should consider it for future work, which we can certainly recommend coming out of this group.
Can I suggest that you, Josh, note it on the relevant “wish list” on the main page of the wiki, so it does not get forgotten? Or, put it as an “issue” on the tracker to ensure it gets more attention if you prefer. We can put it on a meeting agenda, but can it wait for the UCR to stabilise first?

Didn’t  I meet you, Bruce,  in the Melbourne office  earlier this year? If  you are in Canberra some time it would be nice to catch up on these matters.


From: Bruce Bannerman [mailto:B.Bannerman@bom.gov.au]
Sent: Tuesday, 19 May 2015 8:58 AM
To: Taylor, Kerry (Digital, Acton); jlieberman@tumblingwalls.com
Cc: public-sdw-wg@w3.org
Subject: Re: UCR isssue: Is provenance in scope? [SEC=UNCLASSIFIED]

Hi Kerry,

Provenance is particularly important for climate data related issues, and no doubt for many more domains as well.

>From a climate perspective, when I publish a scientific paper, I need to be able to reference all the data that underpins the analysis that the paper was based on. So this may be:

  *   Published paper
  *   Claims in Published paper based on Analytical Data (perhaps a multi dimensional array/grid/coverage)
  *   Analytical data is derived from quality assured observations data (with details as to why each change to the QA obs were made)
  *   Quality assured observations data is derived from ‘raw’ observations data which has details as to the conditions, sensors etc that the observation was made under.
There are many nuances to provenance here. Including an understanding of what algorithms were used to process the data and ideally a reference to the source code of these algorithms as they were at the time of the analysis.

And to make things more interesting, the analysis and data is typically time-series (observations and coverages).

This reminds me I posted on a potential climate use case several months ago, but forgot to add it.

If there is still interest in this, let me know and I’ll put something together.


From: "Kerry.Taylor@csiro.au<mailto:Kerry.Taylor@csiro.au>" <Kerry.Taylor@csiro.au<mailto:Kerry.Taylor@csiro.au>>
Date: Wednesday, 13 May 2015 23:59
To: "jlieberman@tumblingwalls.com<mailto:jlieberman@tumblingwalls.com>" <jlieberman@tumblingwalls.com<mailto:jlieberman@tumblingwalls.com>>
Cc: "public-sdw-wg@w3.org<mailto:public-sdw-wg@w3.org>" <public-sdw-wg@w3.org<mailto:public-sdw-wg@w3.org>>
Subject: RE: UCR isssue: Is provenance in scope?
Resent-From: <public-sdw-wg@w3.org<mailto:public-sdw-wg@w3.org>>
Resent-Date: Thursday, 14 May 2015 00:00

(Resending –missed the list cc)

From: Taylor, Kerry (Digital, Acton)
Sent: Wednesday, 13 May 2015 10:53 PM
To: 'Joshua Lieberman'
Subject: RE: UCR isssue: Is provenance in scope?

I think we need only to make sure (and perhaps show how) our deliverables can deal with provenance by attaching/linking  some W3C Prov-o. I would not suggest we need to show to encode spatial data provenance in PROv-o  though.
Provenance is a first class issue in a great deal of spatial data applications.


From: Joshua Lieberman [mailto:jlieberman@tumblingwalls.com]
Sent: Wednesday, 13 May 2015 10:38 PM
To: Frans Knibbe
Cc: SDW WG Public List
Subject: Re: UCR isssue: Is provenance in scope?

Perhaps we can discuss the general issue of scope today on the call. There are many aspects of spatiotemporal data that in general are similar to issues with other data, but that clearly require specialization for our case. For example, reference provenance of spatial data must address not only how a feature and a spatial such as a geometry were formed, but how they were associated and under what assumptions for representation of the physical world. This is quite specialized to spatial and a significant semantic interoperability issue. We will miss addressing critical points in our work if we subsume them too often into general ones and deem them out of scope.


Joshua Lieberman, Ph.D.
Tumbling Walls
+1 617 431 6431

On May 13, 2015, at 8:21 AM, Frans Knibbe <frans.knibbe@geodan.nl<mailto:frans.knibbe@geodan.nl>> wrote:

Hello all,

I have raised an issue for the UCR document: ISSUE-11<http://www.w3.org/2015/spatial/track/issues/11>.
Again, all help in getting this issue resolved is very welcome.


Frans Knibbe
President Kennedylaan 1
1079 MB Amsterdam (NL)

T +31 (0)20 - 5711 347
E frans.knibbe@geodan.nl<mailto:frans.knibbe@geodan.nl>
Received on Thursday, 21 May 2015 11:22:56 UTC

This archive was generated by hypermail 2.4.0 : Thursday, 24 March 2022 20:31:16 UTC