Re: Absence of key scientific spatial data formats within common formats to implementation of Best Practices [SEC=UNCLASSIFIED]

One perspective to keep in mind is that a format such as NetCDF is amazingly capable, and not particularly accessible.Specifically, there is rarely a graduated approach to finding and using such a resource. Either you search for it, download it, and open it in a compatible application, or you don’t. A “Spatial Data on the Web” approach would be to provide more ways to interact with simpler, but semantically expressive representations on the Web and then follow links to both the full NetCDF resource and (Web) applications that can deal with it. Think of  a hurricane path prediction, with 100 million people looking at a simple graphic, 10,000 people viewing the NetCDF model coverage, and 1000 people actually pulling it into other models. How do we make that a Web and linked data process?

Josh

> On Mar 30, 2016, at 1:52 AM, Ed Parsons <eparsons@google.com> wrote:
> 
> Yes fully accept the "relative" part,  but we are talking about "the web" as the user community and that is even bigger than Google ;-)
> 
> Ed
> 
> 
> On Wed, 30 Mar 2016 at 09:43 Bill Roberts <bill@swirrl.com <mailto:bill@swirrl.com>> wrote:
> Hi Lewis
> 
> Thanks for the very useful input on scientific data formats.  I don't remember exactly the background to the list of spatial formats that was the starting point for this thread, but although those formats like GML, GeoJSON etc can encode 'data', I (and perhaps others?) tend to think of those as formats for 'geometry' whereas I think of NetCDF as a format for 'data'. 
> 
> Anyway, I don't want to get into that, I just want to note that I'm one of the editors of the Coverages sub-group and the points you have raised in recent emails are very relevant for that.  The main issues we are discussing in that group relate to 'web-friendly' formats for coverage data (whatever web-friendly turns out to mean in that context!) and approaches to identifying and retrieving extracts (aka subsets) of Coverage data.
> 
> For both of those, it would be great to get your input and to make sure we take due consideration of use cases that are important to JPL, NASA etc.
> 
> Cheers
> 
> Bill
> 
> 
> 
> On 30 March 2016 at 07:38, lewis john mcgibbney <lewismc@apache.org <mailto:lewismc@apache.org>> wrote:
> Hi Bruce,
> Replies inline
> 
> On Mon, Mar 28, 2016 at 7:21 PM, Bruce Bannerman <B.Bannerman@bom.gov.au <mailto:B.Bannerman@bom.gov.au>> wrote:
> 
> 
> Hi Lewis,
> 
> More inline below.
> 
> Bruce
> 
>  
> Regarding describing our datasets:
> We don’t do this as well as we could. We will begin addressing this in the near future. Much of our current work is either internally focussed, or at a much too granular level.
> 
> Understood. This also seems to be quite a common observation from the datasets (and the platforms which make this data available) I work with so I acknowledge the point.
> We intend describing our data sets using ISO 19115 with support for several profiles, including WMO and ANZLIC.
> I believe the OCO2 products (and many others) I've worked with also came with ISO 19115 metadata within the data product. These products were HDF5. I am familiar with the ISO standard(s) as well.
> http://oco.jpl.nasa.gov/science/ProductInfo/# <http://oco.jpl.nasa.gov/science/ProductInfo/#>
>  
> I can’t see us moving away from this paradigm, but there is certainly potential for LinkedData approaches as alternate methods of discovering our data.
> Agreed. This is where my work in this group is (again) justified. 
>  
> But this is also only part of the issue: 
> We also need a mechanism to better understand the context of our observations (e.g. What sensor; what model; when was it last calibrated; maintained; what sensor maintenance process and responsible party; what observation process etc). We will be using the new WMO WIGOS Observations Metadata standard to support this concept.
> Interesting. Have you looked at encoding this into any linked data approach as of yet?
>  
> As discussed before on this list (and in the SDWWG Climate data related use case), there is also the issue of data provenance.
> Yes there sure is. We make heavy use of the PROV-ES specification here @JPL for such requirements.  
> Data Quality and IP issues will also become a big issue, particularly with the increasing use of mixed Bureau and 3rd party observations and the subsequent derived products that we create from these observations.
> Derived and value added products are certainly in high demand for new(er) data products which are made available, however without an expansion on this topic I see this more as a process issue rather than one relating to the spatial data format itself. This is absolutely OK though. If you feel like expanding then I am all ears. I see that such topics feature heavily within the CDMS spec you posted below. These are relevant topics indeed however I'll state that I am not sure they feature on the current agenda for this WG.
>  
> 
> 
> 
> Further, I expect that we'll need to go further and work with our peers to agree on semantic definitions of the content that we portray for each relevant domain and its inter-relationships with other domains.
> 
> This sounds like the next step... the issues we're discussing above seem like the precursor. Am I correct?
> 
> Not necessarily, consider the work that has been undertaken on GeoSciML, WaterML etc.
> 
> A lot of this is based on communities of a common interest getting together and agreeing on and using common terms and concepts.
> 
> It takes many, many years of community building to reach the required consensus.
> 
> OK, I was just trying to bring it back to what we can achieve within the maneuverability and scope of this WG.
>  
> 
> 
> Do you have any examples from the field of Meteorology? i would be interested to see if I could pick out any examples more familiar to other aspects of Earth Science, Pysical Oceanography or something else a bit closer to 'home' for my current working agenda.
> 
> 
> The closest that I can point to at the moment is the work that we have been doing in WMO on WMO #1131, Climate Data Management System Specifications http://library.wmo.int/opac/index.php?lvl=notice_display&id=16300 <http://library.wmo.int/opac/index.php?lvl=notice_display&id=16300>
> 
> Wow this is a meaty, very substantial document. It will take me a while to read as it's the first time I've seen it. I undertook a preliminary search for 'data access' and 'access' and it returned a few results so I will scope them out and see what interesting content I can muse over. 
>  
> 
> There is also related work, e.g.:
> Foundation data governance and data modelling work within WMO that Jeremy Tandy is leading
> Foundation work that has been undertaken by Australia’s CSIRO over many years: https://www.seegrid.csiro.au/wiki/Siss/WebHome <https://www.seegrid.csiro.au/wiki/Siss/WebHome> 
> And to be honest, much of the underpinning OGC standards efforts that we build on top of.
> 
> This is really laying the groundwork, and it will take many years to get there with truly federated data and data services.
> 
> So what are your thoughts then about how this all fits in with one or more of the aims of this WG? When worded like it has been above, this scientifici data angle (which you, I and a few others are coming from) seems to be somewhat different from the other working group members. It is certainly a different conversation we are having here from what I have seen or heard going on elsewhere in this WG. I've also checked the WG mailing list archives are there is very little conversation at all about scientific data formats within the overall context of this WG.
>  
> Thanks. I am glad to see that this thread is now picking up some traction.
> Lewis
> 
> -- 
> Ed Parsons FRGS
> Geospatial Technologist, Google
> 
> Google Voice +44 (0)20 7881 4501
> www.edparsons.com <http://www.edparsons.com/> @edparsons
> 

Received on Wednesday, 30 March 2016 17:55:39 UTC