Re: Absence of key scientific spatial data formats within common formats to implementation of Best Practices [SEC=UNCLASSIFIED]

Hi Bruce,
Thanks for your response. I'll make some comments below

On Wed, Mar 23, 2016 at 1:26 PM, Bruce Bannerman <B.Bannerman@bom.gov.au>
wrote:

> Hi Lewis,
>
> I have still to find the time to review the latest document and provide
> comment.
>

OK. I would like to see your comments when you do and I suppose I will have
some replies.


>
> However regarding this issue, we have no intention of moving away from the
> scientific data formats that we use within our large data holdings.
>

Same here. This is where I see some value in addressing the following area
in order to still drive value from the spatial data encoded within the
dataset(s). Many (virtually all) of our dataset landing pages, believe it
or not, still do not have any kind of semantic markup hence are relatively
undiscovered outside of the NASA Data Active Archive Center (DAAC) portals.
An example is the landing page at [0] which describes the SeaWinds on
QuikSCAT Enhanced Resolution Regionally Gridded Sigma-0 (BYU, D. Long)
dataset. When I extract the implicit semantic markup from within this page
(using Apache Any23 [2]) I get very few meaning relationships which I can
utilize programmatically. I extracted result in JSON shows you that.

I do however also think that moving towards a hypermedia-based mechanism
for describing the data granules behind these dataset landing pages is also
useful. I found Linda's recent post on the Dutch crawling task very
interesting in this regard.

What are your thoughts here? Do you describe your datasets in any
meaningful way? I think that there is a HUGE a mount of work to be done
here to improve programmatic interpretation of the underlying scientific
data.

[0]
http://podaac.jpl.nasa.gov/dataset/QUIKSCAT_BYU_L3_OW_SIGMA0_ENHANCED?ids=Measurement&values=Sea%20Ice
[1] http://any23.apache.org
[2] https://paste.apache.org/lphl



>
> If anything, I expect that we will need to work with our peers to define
> formal data format definitions that are consistent with modern spatial
> requirements, e.g. full support for Spatial Reference Systems and other CRS
> definition and that don't constrain our ability to adequately portray the
> complexity of our data.
>

I agree here.


> I expect that we'll probably need to do this via OGC processes. We want to
> ensure that the data that we collect and archive now will still be
> accessible for our key stakeholders who have not yet been born.
>

This view is consistent across the entire NASA data archival spectrum as
well... and as long term data stewards this is a logical viewpoint.


>
> Further, I expect that we'll need to go further and work with our peers to
> agree on semantic definitions of the content that we portray for each
> relevant domain and its inter-relationships with other domains.
>

This sounds like the next step... the issues we're discussing above seem
like the precursor. Am I correct?


> This is similar in concept to what the hydrology community have done with
> WaterML 2, but I expect that we'll need to take it further, particularly
> the inter-domain relationships.
>

Yes, I really like ongoing work on hydrology with WaterML2 and this is an
excellent point. It is however again, in my own opinion, something which
follows on from he above.


>
> When we are trying to understand global systems and their interaction with
> other systems, and we are doing this with our peers in distributed data
> collections and services, the need for formal data definitions become
> critical. This is especially so if we want global, federated, data sets
> ***and dynamic services*** describing specific phenomena.
>

Do you have any examples from the field of Meteorology? i would be
interested to see if I could pick out any examples more familiar to other
aspects of Earth Science, Pysical Oceanography or something else a bit
closer to 'home' for my current working agenda.


> It will allow us to spend much less wasted time in getting data prepared
> for global analysis and much more time on the actual analysis and
> understanding the implications of the results.
>
> Agreed!
Thank you for the very meaningful conversation. Looking forward to any
follow up if you have it.

In the meantime, I come back to my main question. Is there any reason from
across the group why the current matrix of spatial data formats doesn't
include formats such as GRIB, HDF4, HDF5, netCDF3, netCDF4, etc? These are
used pervasively throughout the sciences and I am very surprised to see
them absent.
Thanks folks.
Lewis

Received on Friday, 25 March 2016 21:25:15 UTC