W3C home > Mailing lists > Public > public-sdw-wg@w3.org > March 2016

Re: Absence of key scientific spatial data formats within common formats to implementation of Best Practices [SEC=UNCLASSIFIED]

From: Bruce Bannerman <B.Bannerman@bom.gov.au>
Date: Tue, 29 Mar 2016 02:21:15 +0000
To: lewis john mcgibbney <lewismc@apache.org>
CC: SDW WG Public List <public-sdw-wg@w3.org>
Message-ID: <D3202A5D.28522%B.Bannerman@bom.gov.au>
Hi Lewis,

More inline below.


From: lewis john mcgibbney <lewismc@apache.org<mailto:lewismc@apache.org>>
Date: Saturday, 26 March 2016 at 08:24
To: Bruce Bannerman <B.Bannerman@bom.gov.au<mailto:B.Bannerman@bom.gov.au>>
Cc: SDW WG Public List <public-sdw-wg@w3.org<mailto:public-sdw-wg@w3.org>>
Subject: Re: Absence of key scientific spatial data formats within common formats to implementation of Best Practices [SEC=UNCLASSIFIED]

Hi Bruce,
Thanks for your response. I'll make some comments below

On Wed, Mar 23, 2016 at 1:26 PM, Bruce Bannerman <B.Bannerman@bom.gov.au<mailto:B.Bannerman@bom.gov.au>> wrote:
Hi Lewis,

I have still to find the time to review the latest document and provide comment.

OK. I would like to see your comments when you do and I suppose I will have some replies.

However regarding this issue, we have no intention of moving away from the scientific data formats that we use within our large data holdings.

Same here. This is where I see some value in addressing the following area in order to still drive value from the spatial data encoded within the dataset(s). Many (virtually all) of our dataset landing pages, believe it or not, still do not have any kind of semantic markup hence are relatively undiscovered outside of the NASA Data Active Archive Center (DAAC) portals. An example is the landing page at [0] which describes the SeaWinds on QuikSCAT Enhanced Resolution Regionally Gridded Sigma-0 (BYU, D. Long) dataset. When I extract the implicit semantic markup from within this page (using Apache Any23 [2]) I get very few meaning relationships which I can utilize programmatically. I extracted result in JSON shows you that.

I do however also think that moving towards a hypermedia-based mechanism for describing the data granules behind these dataset landing pages is also useful. I found Linda's recent post on the Dutch crawling task very interesting in this regard.

What are your thoughts here? Do you describe your datasets in any meaningful way? I think that there is a HUGE a mount of work to be done here to improve programmatic interpretation of the underlying scientific data.

[0] http://podaac.jpl.nasa.gov/dataset/QUIKSCAT_BYU_L3_OW_SIGMA0_ENHANCED?ids=Measurement&values=Sea%20Ice
[1] http://any23.apache.org
[2] https://paste.apache.org/lphl

Regarding describing our datasets:

  *   We donít do this as well as we could. We will begin addressing this in the near future. Much of our current work is either internally focussed, or at a much too granular level.
  *   We intend describing our data sets using ISO 19115 with support for several profiles, including WMO and ANZLIC.
  *   I canít see us moving away from this paradigm, but there is certainly potential for LinkedData approaches as alternate methods of discovering our data.

But this is also only part of the issue:

  *   We also need a mechanism to better understand the context of our observations (e.g. What sensor; what model; when was it last calibrated; maintained; what sensor maintenance process and responsible party; what observation process etc). We will be using the new WMO WIGOS Observations Metadata standard to support this concept.
  *   As discussed before on this list (and in the SDWWG Climate data related use case), there is also the issue of data provenance.
  *   Data Quality and IP issues will also become a big issue, particularly with the increasing use of mixed Bureau and 3rd party observations and the subsequent derived products that we create from these observations.

If anything, I expect that we will need to work with our peers to define formal data format definitions that are consistent with modern spatial requirements, e.g. full support for Spatial Reference Systems and other CRS definition and that don't constrain our ability to adequately portray the complexity of our data.

I agree here.

I expect that we'll probably need to do this via OGC processes. We want to ensure that the data that we collect and archive now will still be accessible for our key stakeholders who have not yet been born.

This view is consistent across the entire NASA data archival spectrum as well... and as long term data stewards this is a logical viewpoint.

Further, I expect that we'll need to go further and work with our peers to agree on semantic definitions of the content that we portray for each relevant domain and its inter-relationships with other domains.

This sounds like the next step... the issues we're discussing above seem like the precursor. Am I correct?

Not necessarily, consider the work that has been undertaken on GeoSciML, WaterML etc.

A lot of this is based on communities of a common interest getting together and agreeing on and using common terms and concepts.

It takes many, many years of community building to reach the required consensus.

This is similar in concept to what the hydrology community have done with WaterML 2, but I expect that we'll need to take it further, particularly the inter-domain relationships.

Yes, I really like ongoing work on hydrology with WaterML2 and this is an excellent point. It is however again, in my own opinion, something which follows on from he above.

When we are trying to understand global systems and their interaction with other systems, and we are doing this with our peers in distributed data collections and services, the need for formal data definitions become critical. This is especially so if we want global, federated, data sets ***and dynamic services*** describing specific phenomena.

Do you have any examples from the field of Meteorology? i would be interested to see if I could pick out any examples more familiar to other aspects of Earth Science, Pysical Oceanography or something else a bit closer to 'home' for my current working agenda.

The closest that I can point to at the moment is the work that we have been doing in WMO on WMO #1131, Climate Data Management System Specifications http://library.wmo.int/opac/index.php?lvl=notice_display&id=16300<http://library.wmo.int/opac/index.php?lvl=notice_display&id=16300>

There is also related work, e.g.:

  *   Foundation data governance and data modelling work within WMO that Jeremy Tandy is leading
  *   Foundation work that has been undertaken by Australiaís CSIRO over many years: https://www.seegrid.csiro.au/wiki/Siss/WebHome
  *   And to be honest, much of the underpinning OGC standards efforts that we build on top of.

This is really laying the groundwork, and it will take many years to get there with truly federated data and data services.

It will allow us to spend much less wasted time in getting data prepared for global analysis and much more time on the actual analysis and understanding the implications of the results.

Thank you for the very meaningful conversation. Looking forward to any follow up if you have it.

In the meantime, I come back to my main question. Is there any reason from across the group why the current matrix of spatial data formats doesn't include formats such as GRIB, HDF4, HDF5, netCDF3, netCDF4, etc? These are used pervasively throughout the sciences and I am very surprised to see them absent.
Thanks folks.
Received on Tuesday, 29 March 2016 02:21:51 UTC

This archive was generated by hypermail 2.4.0 : Thursday, 24 March 2022 20:31:20 UTC