Case study: Site specific weather forecasts

(This section contributed by Dave Reynolds)

The Met Office, the UK's National Weather Service, provides a range of weather forecast products including openly available site-specific forecasts for the UK. The site specific forecasts cover over 5000 forecast points, each forecast predicts 10 parameters and spans a 5 day window at 3 hourly intervals, the whole forecast is updated each hour. A proof of concept project investigated the challenge of publishing this information as linked data using the Data Cube vocabulary.

Challenges

Solution

O&M provides a data model for an Observation with associated Phenomenon, measurement ProcessUsed, Domain (feature of interest) and Result. Prototype vocabularies developed at CSIRO and extended within this project allow this data model to be represented in RDF. For the site specific forecasts then a 5-day forecast for all 5000+ sites is regarded as a single O&M Observation.

To represent the forecast data itself, the Result in the O&M model, then the relevant standard is ISO19123 "Geographic information — Schema for coverage geometry and functions". This provides a data model for a Coverage which can represent a set of values across some space. It defines different types of Coverage including a DiscretePointCoverage suited to representing site-specific forecast results.

It turns out that it is straightforward to treat an RDF Data Cube as a particular concrete representation of the DiscretePointCoverage logical model. The cube has dimensions corresponding to the forecast time and location and the measure is a record representing the forecast values of the 10 phenomena. Slices by time and location provide subsets of the data that directly match the data packages supported by an existing on-line service.

Note that in this situation an observation in the sense of qb:Observation and an observation in the sense of ISO19156 Observations and Measurements are different things. The O&M Observation is the whole forecast whereas each qb:Observation corresponds to a single GeometryValuePair within the forecast results Coverage.

Regarding bandwidth costs then the key is not raw data volume but compressibility, since such data is transmitted in compressed form. A Turtle representation of an non-abbreviated data cube compressed to within 15-20% of the size of compressed, handcrafted XML and JSON representations. Thus obviating the need for abbreviations or custom serialization.

Lessons