W3C SDWIG Stats BP: 'Parameter' use case

Dear Bill and Stats BP colleagues,

1. Many container data formats, and even service APIs and protocols, have controlled lists/taxonomies of parameters/observations/variables/measurements. These values of interest may be scalar, vector or even tensor valued. E.g. surface atmospheric pressure, sub-surface ocean current velocity or wind stress (used to forecast ocean waves), respectively.

In Meteorology and Oceanography, these lists have been maintained globally, in multiple languages, for decades. Three major container formats that use these kinds of lists are: 
  
NetCDF - a generic format with a large ecosystem of tools and applications, and several conventions for metadata, such as CF http://cfconventions.org/Data/cf-standard-names/46/build/cf-standard-name-table.html  and COARDS;
  
GRIB - a similar, more compact operational format for multidimensional gridded data, with tightly controlled lists/tables managed by WMO, see http://www.wmo.int/pages/prog/www/WMOCodes/WMO306_vI2/LatestVERSION/WMO306_vI2_GRIB2_CodeFlag_en.pdf Code Table 4.2;
  
BUFR - another WMO operational format, suitable for point, line and polygon like features, with thousands of entries in its controlled lists, see http://www.wmo.int/pages/prog/www/WMOCodes/WMO306_vI2/LatestVERSION/WMO306_vI2_BUFRCREX_TableB_en.pdf .

To keep these lists manageable, and to avoid combinatorial explosions of possibilities, attributes or qualifiers have been constructed so that various derived statistics of the parameters can be indicated in the metadata, such as mean, median, standard deviation, variance, etc., without creating new entries.

These schemes are incomplete, as second and higher order statistics, such as quartiles, quintiles, deciles and even percentiles of a parameter distribution are routinely used, but there is no standard scheme of creating and applying these qualifiers. It is best practice in meteorology and oceanography to forecast a range of values, known as an ensemble, for a parameter of interest, and then extract various statistics and threshold values. The ensembles typically have 50 -100 members.  

The various schemes and the controlled lists are also inconsistent, as, for example, one strategic policy has been to generate extra entries for commonly used statistics of parameters, so the registries may contain both (instantaneous) wind speed, and mean wind speed, for example. 

The use case, or more precisely, a requirement, is to have a standard statistical scheme that allows the consistent and rigorous generation of a variety of statistical qualifiers to create useful and machinable metadata to qualify lists of parameters in a variety of domains.

Chris

Chris Little
Chair, OGC Meteorology & Oceanography Domain Working Group
Member OGC Architecture Board

IT Fellow - Operational Infrastructures
Met Office  FitzRoy Road  Exeter  Devon  EX1 3PB  United Kingdom
Tel: +44(0)1392 886278  Fax: +44(0)1392 885681  Mobile: +44(0)7753 880514
E-mail: chris.little@metoffice.gov.uk  http://www.metoffice.gov.uk

I am normally at work Tuesday, Wednesday and Thursday each week

Received on Thursday, 23 November 2017 14:10:36 UTC