- From: lewis john mcgibbney <lewismc@apache.org>
- Date: Fri, 25 Mar 2016 14:24:40 -0700
- To: Bruce Bannerman <B.Bannerman@bom.gov.au>
- Cc: SDW WG Public List <public-sdw-wg@w3.org>
- Message-ID: <CAGaRif0aBSHuWQYiAQ6b21yVW2WkWCfSgLKo6GfKc-ZWLPXmOg@mail.gmail.com>
Hi Bruce, Thanks for your response. I'll make some comments below On Wed, Mar 23, 2016 at 1:26 PM, Bruce Bannerman <B.Bannerman@bom.gov.au> wrote: > Hi Lewis, > > I have still to find the time to review the latest document and provide > comment. > OK. I would like to see your comments when you do and I suppose I will have some replies. > > However regarding this issue, we have no intention of moving away from the > scientific data formats that we use within our large data holdings. > Same here. This is where I see some value in addressing the following area in order to still drive value from the spatial data encoded within the dataset(s). Many (virtually all) of our dataset landing pages, believe it or not, still do not have any kind of semantic markup hence are relatively undiscovered outside of the NASA Data Active Archive Center (DAAC) portals. An example is the landing page at [0] which describes the SeaWinds on QuikSCAT Enhanced Resolution Regionally Gridded Sigma-0 (BYU, D. Long) dataset. When I extract the implicit semantic markup from within this page (using Apache Any23 [2]) I get very few meaning relationships which I can utilize programmatically. I extracted result in JSON shows you that. I do however also think that moving towards a hypermedia-based mechanism for describing the data granules behind these dataset landing pages is also useful. I found Linda's recent post on the Dutch crawling task very interesting in this regard. What are your thoughts here? Do you describe your datasets in any meaningful way? I think that there is a HUGE a mount of work to be done here to improve programmatic interpretation of the underlying scientific data. [0] http://podaac.jpl.nasa.gov/dataset/QUIKSCAT_BYU_L3_OW_SIGMA0_ENHANCED?ids=Measurement&values=Sea%20Ice [1] http://any23.apache.org [2] https://paste.apache.org/lphl > > If anything, I expect that we will need to work with our peers to define > formal data format definitions that are consistent with modern spatial > requirements, e.g. full support for Spatial Reference Systems and other CRS > definition and that don't constrain our ability to adequately portray the > complexity of our data. > I agree here. > I expect that we'll probably need to do this via OGC processes. We want to > ensure that the data that we collect and archive now will still be > accessible for our key stakeholders who have not yet been born. > This view is consistent across the entire NASA data archival spectrum as well... and as long term data stewards this is a logical viewpoint. > > Further, I expect that we'll need to go further and work with our peers to > agree on semantic definitions of the content that we portray for each > relevant domain and its inter-relationships with other domains. > This sounds like the next step... the issues we're discussing above seem like the precursor. Am I correct? > This is similar in concept to what the hydrology community have done with > WaterML 2, but I expect that we'll need to take it further, particularly > the inter-domain relationships. > Yes, I really like ongoing work on hydrology with WaterML2 and this is an excellent point. It is however again, in my own opinion, something which follows on from he above. > > When we are trying to understand global systems and their interaction with > other systems, and we are doing this with our peers in distributed data > collections and services, the need for formal data definitions become > critical. This is especially so if we want global, federated, data sets > ***and dynamic services*** describing specific phenomena. > Do you have any examples from the field of Meteorology? i would be interested to see if I could pick out any examples more familiar to other aspects of Earth Science, Pysical Oceanography or something else a bit closer to 'home' for my current working agenda. > It will allow us to spend much less wasted time in getting data prepared > for global analysis and much more time on the actual analysis and > understanding the implications of the results. > > Agreed! Thank you for the very meaningful conversation. Looking forward to any follow up if you have it. In the meantime, I come back to my main question. Is there any reason from across the group why the current matrix of spatial data formats doesn't include formats such as GRIB, HDF4, HDF5, netCDF3, netCDF4, etc? These are used pervasively throughout the sciences and I am very surprised to see them absent. Thanks folks. Lewis
Received on Friday, 25 March 2016 21:25:15 UTC