W3C home > Mailing lists > Public > public-sdw-wg@w3.org > March 2016

Re: Absence of key scientific spatial data formats within common formats to implementation of Best Practices [SEC=UNCLASSIFIED]

From: Ed Parsons <eparsons@google.com>
Date: Tue, 29 Mar 2016 08:55:36 +0000
Message-ID: <CAHrFjcn-5an8H-cdfi2Z7gALNTwoFYU=g2E+jQghDxQQ+4+nmA@mail.gmail.com>
To: Linda van den Brink <l.vandenbrink@geonovum.nl>, Bruce Bannerman <B.Bannerman@bom.gov.au>, lewis john mcgibbney <lewismc@apache.org>
Cc: SDW WG Public List <public-sdw-wg@w3.org>
Hello all,

This is worth discussing in the context of the intended audience for the BP
document, my personal view is that those formats you list are of interest
to a predominantly  small (in relative terms) scientific audience ?

Ed

On Tue, 29 Mar 2016, 09:11 Linda van den Brink, <l.vandenbrink@geonovum.nl>
wrote:

> Hi Bruce, Lewis,
>
>
>
> The scope and purpose of the common formats list in the BP hasn’t been
> discussed exhaustively. What’s currently in the BP is a first draft, or
> rather two; one list by Ed Parsons and one by Clemens Portele. Including
> scientific spatial data  formats in these lists hasn’t come up yet. It
> could be argued that scientific formats weren’t considered ‘common’
> formats, but as I said it hasn’t come up yet. Both lists only list vector
> data formats.
>
>
>
> I have created an issue about this so that we don’t forget to address
> this.
>
> https://github.com/w3c/sdw/issues/237
>
>
>
> Linda
>
>
>
> *Van:* Bruce Bannerman [mailto:B.Bannerman@bom.gov.au]
> *Verzonden:* dinsdag 29 maart 2016 04:21
> *Aan:* lewis john mcgibbney
> *CC:* SDW WG Public List
> *Onderwerp:* Re: Absence of key scientific spatial data formats within
> common formats to implementation of Best Practices [SEC=UNCLASSIFIED]
>
>
>
> Hi Lewis,
>
>
>
> More inline below.
>
>
>
> Bruce
>
>
>
>
>
> *From: *lewis john mcgibbney <lewismc@apache.org>
> *Date: *Saturday, 26 March 2016 at 08:24
> *To: *Bruce Bannerman <B.Bannerman@bom.gov.au>
> *Cc: *SDW WG Public List <public-sdw-wg@w3.org>
> *Subject: *Re: Absence of key scientific spatial data formats within
> common formats to implementation of Best Practices [SEC=UNCLASSIFIED]
>
>
>
> Hi Bruce,
>
> Thanks for your response. I'll make some comments below
>
>
>
> On Wed, Mar 23, 2016 at 1:26 PM, Bruce Bannerman <B.Bannerman@bom.gov.au>
> wrote:
>
> Hi Lewis,
>
> I have still to find the time to review the latest document and provide
> comment.
>
>
>
> OK. I would like to see your comments when you do and I suppose I will
> have some replies.
>
>
>
>
> However regarding this issue, we have no intention of moving away from the
> scientific data formats that we use within our large data holdings.
>
>
>
> Same here. This is where I see some value in addressing the following area
> in order to still drive value from the spatial data encoded within the
> dataset(s). Many (virtually all) of our dataset landing pages, believe it
> or not, still do not have any kind of semantic markup hence are relatively
> undiscovered outside of the NASA Data Active Archive Center (DAAC) portals.
> An example is the landing page at [0] which describes the SeaWinds on
> QuikSCAT Enhanced Resolution Regionally Gridded Sigma-0 (BYU, D. Long)
> dataset. When I extract the implicit semantic markup from within this page
> (using Apache Any23 [2]) I get very few meaning relationships which I can
> utilize programmatically. I extracted result in JSON shows you that.
>
> I do however also think that moving towards a hypermedia-based mechanism
> for describing the data granules behind these dataset landing pages is also
> useful. I found Linda's recent post on the Dutch crawling task very
> interesting in this regard.
>
> What are your thoughts here? Do you describe your datasets in any
> meaningful way? I think that there is a HUGE a mount of work to be done
> here to improve programmatic interpretation of the underlying scientific
> data.
>
>
> [0]
> http://podaac.jpl.nasa.gov/dataset/QUIKSCAT_BYU_L3_OW_SIGMA0_ENHANCED?ids=Measurement&values=Sea%20Ice
> [1] http://any23.apache.org
> [2] https://paste.apache.org/lphl
>
>
>
>
>
>
>
> Regarding describing our datasets:
>
>    - We don’t do this as well as we could. We will begin addressing this
>    in the near future. Much of our current work is either internally focussed,
>    or at a much too granular level.
>    - We intend describing our data sets using ISO 19115 with support for
>    several profiles, including WMO and ANZLIC.
>    - I can’t see us moving away from this paradigm, but there is
>    certainly potential for LinkedData approaches as alternate methods of
>    discovering our data.
>
> But this is also only part of the issue:
>
>    - We also need a mechanism to better understand the context of our
>    observations (e.g. What sensor; what model; when was it last calibrated;
>    maintained; what sensor maintenance process and responsible party; what
>    observation process etc). We will be using the new WMO WIGOS Observations
>    Metadata standard to support this concept.
>    - As discussed before on this list (and in the SDWWG Climate data
>    related use case), there is also the issue of data provenance.
>    - Data Quality and IP issues will also become a big issue,
>    particularly with the increasing use of mixed Bureau and 3rd party
>    observations and the subsequent derived products that we create from these
>    observations.
>
>
>
>
>
>
> If anything, I expect that we will need to work with our peers to define
> formal data format definitions that are consistent with modern spatial
> requirements, e.g. full support for Spatial Reference Systems and other CRS
> definition and that don't constrain our ability to adequately portray the
> complexity of our data.
>
>
>
> I agree here.
>
>
>
> I expect that we'll probably need to do this via OGC processes. We want to
> ensure that the data that we collect and archive now will still be
> accessible for our key stakeholders who have not yet been born.
>
>
>
> This view is consistent across the entire NASA data archival spectrum as
> well... and as long term data stewards this is a logical viewpoint.
>
>
>
>
> Further, I expect that we'll need to go further and work with our peers to
> agree on semantic definitions of the content that we portray for each
> relevant domain and its inter-relationships with other domains.
>
>
>
> This sounds like the next step... the issues we're discussing above seem
> like the precursor. Am I correct?
>
>
>
> Not necessarily, consider the work that has been undertaken on GeoSciML,
> WaterML etc.
>
>
>
> A lot of this is based on communities of a common interest getting
> together and agreeing on and using common terms and concepts.
>
>
>
> It takes many, many years of community building to reach the required
> consensus.
>
>
>
>
>
>
>
>
>
>
>
> This is similar in concept to what the hydrology community have done with
> WaterML 2, but I expect that we'll need to take it further, particularly
> the inter-domain relationships.
>
>
>
> Yes, I really like ongoing work on hydrology with WaterML2 and this is an
> excellent point. It is however again, in my own opinion, something which
> follows on from he above.
>
>
>
>
> When we are trying to understand global systems and their interaction with
> other systems, and we are doing this with our peers in distributed data
> collections and services, the need for formal data definitions become
> critical. This is especially so if we want global, federated, data sets
> ***and dynamic services*** describing specific phenomena.
>
>
>
> Do you have any examples from the field of Meteorology? i would be
> interested to see if I could pick out any examples more familiar to other
> aspects of Earth Science, Pysical Oceanography or something else a bit
> closer to 'home' for my current working agenda.
>
>
>
>
>
> The closest that I can point to at the moment is the work that we have
> been doing in WMO on WMO #1131, Climate Data Management System
> Specifications h
> ttp://library.wmo.int/opac/index.php?lvl=notice_display&id=16300
> <http://library.wmo.int/opac/index.php?lvl=notice_display&id=16300>
>
>
>
> There is also related work, e.g.:
>
>    - Foundation data governance and data modelling work within WMO that
>    Jeremy Tandy is leading
>    - Foundation work that has been undertaken by Australia’s CSIRO over
>    many years: https://www.seegrid.csiro.au/wiki/Siss/WebHome
>    - And to be honest, much of the underpinning OGC standards efforts
>    that we build on top of.
>
>
>
> This is really laying the groundwork, and it will take many years to get
> there with truly federated data and data services.
>
>
>
>
>
>
>
>
>
>
>
> It will allow us to spend much less wasted time in getting data prepared
> for global analysis and much more time on the actual analysis and
> understanding the implications of the results.
>
> Agreed!
>
> Thank you for the very meaningful conversation. Looking forward to any
> follow up if you have it.
>
> In the meantime, I come back to my main question. Is there any reason from
> across the group why the current matrix of spatial data formats doesn't
> include formats such as GRIB, HDF4, HDF5, netCDF3, netCDF4, etc? These are
> used pervasively throughout the sciences and I am very surprised to see
> them absent.
>
> Thanks folks.
>
> Lewis
>
>
> --

*Ed Parsons *FRGS
Geospatial Technologist, Google

Google Voice +44 (0)20 7881 4501
www.edparsons.com @edparsons
Received on Tuesday, 29 March 2016 08:56:14 UTC

This archive was generated by hypermail 2.4.0 : Thursday, 24 March 2022 20:31:20 UTC