RE: Coverage subgroup - document for discussion

Rob and co,

I am not at all confident  I follow this, but I see problems looming. If you want to do reasoning with OWL then you must stick inside the OWL-DL language.
The rdf datacube is already outside OWL-DL (although, I suspect, you could take just a few not-very-helpful axioms out and it would be owl-dl  and do what you want– but this is an untested claim). A quick paper title search reveals there are some papers about doing this – so what to take out   may well be a solved problem.

However, I imagine that the reasoning sketched in this thread here is also outside OWL-DL.

You cannot have both arbitrary expressiveness and sound  and complete reasoning in the same package. That is just fundamental computer science.
Commonly, people give up soundness because they do not notice the error and they really do not want to give up completeness (which is often displayed by ugly  non-termination). Remember “structural subsumption”? Is your “polymorphism and subsumption reasoning” sound?  How do you know?

Anyway, for ontologies, sticking inside OWL-DL  is a very useful way of staying safe and the extensible typing built in to OWL can be used if you need it --(here is a good validator to check) http://mowl-power.cs.man.ac.uk:8080/validator/).


--Kerry
From: Jon Blower [mailto:j.d.blower@reading.ac.uk]
Sent: Monday, 23 May 2016 4:51 AM
To: Rob Atkinson <rob@metalinkage.com.au>; Peter Baumann <p.baumann@jacobs-university.de>; Bill Roberts <bill@swirrl.com>
Cc: Kerry Taylor <kerry.taylor@anu.edu.au>; public-sdw-wg@w3.org
Subject: Re: Coverage subgroup - document for discussion

Thanks for the explanation Rob – I remember having similar conversations with you a few years ago, glad you’re able to get it moving!

Cheers,
Jon

From: Rob Atkinson <rob@metalinkage.com.au<mailto:rob@metalinkage.com.au>>
Date: Friday, 20 May 2016 22:43
To: Jon Blower <sgs02jdb@reading.ac.uk<mailto:sgs02jdb@reading.ac.uk>>, Rob Atkinson <rob@metalinkage.com.au<mailto:rob@metalinkage.com.au>>, Peter Baumann <p.baumann@jacobs-university.de<mailto:p.baumann@jacobs-university.de>>, Bill Roberts <bill@swirrl.com<mailto:bill@swirrl.com>>
Cc: Kerry Taylor <kerry.taylor@anu.edu.au<mailto:kerry.taylor@anu.edu.au>>, "public-sdw-wg@w3.org<mailto:public-sdw-wg@w3.org>" <public-sdw-wg@w3.org<mailto:public-sdw-wg@w3.org>>
Subject: Re: Coverage subgroup - document for discussion


Hi Jon

I dont want to be didactic about the reasoning - its an open-ended potential, but the key theory Ingo and I are working on and will present on in Dublin is based on polymorphism and subsumption reasoning - in that data sets and services will conform to a an interoperability contract, which will be some sort of multiple inheritance tree of other contracts that profile a set of basline standards.

Thus an Observation collection would have a set of declared ranges for its key axes - location (featureOfInterest and/or measured location), time, process, observedProperty etc defined against more abstract axes types.  The set of services that expose the data would conform to the operations declared for the dataset (thats based on ISO19131 view of a Data Product FYI) .

so myObservation may conform to  yourSampingCampaign which imports UK Species Register, Global Biodiveristy Information Facility, GEOSS, UK Government Open Data and INSPIRE contracts which define various aspects.

QB axes seem to be the critical part of this - in that it will tell us how data sets relate to each other semantically. If the information is formalised and available,I'm hopeful that others better versed than me will be able to have even cleverer reasoning about how to determine how to interpert those relationships for discovery (drill down) or processing (e.g. aggregation).

Thats a lead of faith, but basic reasoning like finding all the INSPIRE compliant biodiversity survey data that my INSPIRE-theme-competent client can interpret is definitely easy and useful.

This is just formalising things already happening - like TimeSeriesML profiling O&M - but doing it in an extensible bets practice formalism (OWL) instead of inside documents, and making the mechanisms available to all stakeholders in the data supply chain, not just spec writers.

Rob



On Sat, 21 May 2016 at 01:13 Jon Blower <j.d.blower@reading.ac.uk<mailto:j.d.blower@reading.ac.uk>> wrote:
Hi Rob,

Thanks, I’ve got a better flavour of the motivation now. I’m particularly interested in the part about “supporting useful axiomatic reasoning”. Can you give an example of the sort of things that one might want to reason about in this context?

It’s interesting to think that there might be an overarching meta-model that could unify the various models used by existing services. And I do think that different views will be appropriate to different users. There always seems to be a tension between expressivity and usability – if we can start to resolve that it will be a big bonus.

Cheers,
Jon


From: Rob Atkinson <rob@metalinkage.com.au<mailto:rob@metalinkage.com.au>>
Date: Friday, 20 May 2016 14:10
To: Jon Blower <sgs02jdb@reading.ac.uk<mailto:sgs02jdb@reading.ac.uk>>, Peter Baumann <p.baumann@jacobs-university.de<mailto:p.baumann@jacobs-university.de>>, Rob Atkinson <rob@metalinkage.com.au<mailto:rob@metalinkage.com.au>>, Bill Roberts <bill@swirrl.com<mailto:bill@swirrl.com>>

Cc: Kerry Taylor <kerry.taylor@anu.edu.au<mailto:kerry.taylor@anu.edu.au>>, "public-sdw-wg@w3.org<mailto:public-sdw-wg@w3.org>" <public-sdw-wg@w3.org<mailto:public-sdw-wg@w3.org>>
Subject: Re: Coverage subgroup - document for discussion

Hi Jon,

hopefully we have a range of users in mind - but the "user pull" in the cases I have been putting forward comes from an analysis of the benefits of interoperability for each of multiple stakeholders in a chain from sampling design through data collection, to data infrastructure managers to end users.  Within this context we find mainly server and client software developers served well by the existing styles of specification focussed on schemas and encodings, but users who care about the content face a deficit in metadata expressivity.

We have identified RDF-QB as a meta-model that allows us to describe data and services more accurately in each stage of the chain than any of the fragments of metadata embedded in diverse service descriptions and catalogue oriented schema. Its one which allows us to suipport useful axiomatic reasoning. Its not a matter of familarity with either QB or spatial services - in fact we aim to demonstrate its use behind simplified RESTful services and views to meet each stakeholder's needs.  That's the long story - but necessary to move away from the single-type-of-user style that has hampered adoption and application IMHO.

to put it into scope, its identification of Web-centric best practices to overcome limitations in the self-descriptive power of existing services, and the users are the complete set of stakeholders in the chain.

Rob

On Fri, 20 May 2016 at 19:58 Jon Blower <j.d.blower@reading.ac.uk<mailto:j.d.blower@reading.ac.uk>> wrote:
Thanks for the correction Peter! I didn’t realise that, good to know.

Cheers,
Jon

From: Peter Baumann <p.baumann@jacobs-university.de<mailto:p.baumann@jacobs-university.de>>
Organization: Jacobs University Bremen
Date: Friday, 20 May 2016 10:54
To: Jon Blower <sgs02jdb@reading.ac.uk<mailto:sgs02jdb@reading.ac.uk>>, Rob Atkinson <rob@metalinkage.com.au<mailto:rob@metalinkage.com.au>>, Bill Roberts <bill@swirrl.com<mailto:bill@swirrl.com>>

Cc: Kerry Taylor <kerry.taylor@anu.edu.au<mailto:kerry.taylor@anu.edu.au>>, "public-sdw-wg@w3.org<mailto:public-sdw-wg@w3.org>" <public-sdw-wg@w3.org<mailto:public-sdw-wg@w3.org>>
Subject: Re: Coverage subgroup - document for discussion
Hi Jon,
On 05/20/2016 10:37 AM, Jon Blower wrote:
Hi all,

This is an interesting discussion. Where is the “user pull” coming from? Would I be correct in assuming that the driver behind this is “People who are familiar with QB but not spatial data standards, who want to be able to access spatial data using QB?” Could we write a “pen portrait” of such a user?

I’m assuming that the “spatial expert” community isn’t a likely customer for this, since they already have standards that can do this (probably more efficiently than QB can). But if we identify stuff that’s missing from the spatial standards I guess it can be fed back. If the spatial data expert community could benefit from this, can we be clear about how? (E.g. Can QB be more precise about descriptions than existing standards? If so, why?)

By the way, I do think it’s useful to be able to combine spatiotemporal dimensions with “statistical dimensions” (or “categorical dimensions”) in the same coverage. I don’t think this is allowed in CIS

it is; terminology there is that an axis can be spatial, temporal, or "abstract" (ie, anything else).
WCS allows subsetting on axes that may well be non-numerical.

-Peter
(I might be wrong), but we allow it in CoverageJSON [1].

Cheers,
Jon

[1] See section 5.3 in the draft spec: https://github.com/Reading-eScience-Centre/coveragejson/blob/master/spec.md


From: Rob Atkinson <rob@metalinkage.com.au><mailto:rob@metalinkage.com.au>
Date: Friday, 20 May 2016 05:08
To: Bill Roberts <bill@swirrl.com><mailto:bill@swirrl.com>, Rob Atkinson <rob@metalinkage.com.au><mailto:rob@metalinkage.com.au>
Cc: Kerry Taylor <kerry.taylor@anu.edu.au><mailto:kerry.taylor@anu.edu.au>, "public-sdw-wg@w3.org"<mailto:public-sdw-wg@w3.org> <public-sdw-wg@w3.org><mailto:public-sdw-wg@w3.org>
Subject: Re: Coverage subgroup - document for discussion
Resent-From: <public-sdw-wg@w3.org><mailto:public-sdw-wg@w3.org>
Resent-Date: Friday, 20 May 2016 05:09

Thanks Bill

Your points concerning the scope of RDF-QB I think describe the challenge I am highlighting - most people will come to it with one of a few different types of data in mind, and the potential need to describe some operations/traversals on the dimensions. There is no detailed guidance available. so I'd suggest that we should try to capture the three main cases: coordinates, grids( which may be simple rectangular tiles or more complex options) and features, and show for each how to describe the operations they support.  (The same will apply to time).

I strongly suspect that focusing on the simplest case will add little or no value - one can simply not bother to describe a coordinate dimension and use a naming convention for it. What I believe is necessary is to work through a subset of cases and identify the similarities - and thus an extensible pattern for describing dimensions and operations on them. Probably the three key cases of coordinates, regular grids and statistical features would be enough. We can then invite others to propose solutions for the other cases we know exist, or put this in a "future work" status. (Not sure of the precise mechanics of the W3C processes here)

I dont claim to have a solution pre-prepared - but I am aiming to test one as it emerges by creating specialised dimension specifications for specific data sets and use this as an exemplar in Best Practice recommendations for interoperability in the Citizen Science domain. Its a good test because its inherently multi-disciplinary and requiring both a consistent core and extensibility.

Here is some feedback of potential uses of qb-dimensions to describe URL template variables - not a deep or extensive treament but enough to convince me RDF-QB has the basic structure we need and is well-enough designed to qualify as a BP.
https://confluence.csiro.au/public/SIRF/datanetwork-api/datacube-description

Any feedback on this appreciated, as I will be building on the approach here, with whatever tweaks and alignments are necessary to meet various SDW outputs. How can this best be fed into the EU project?

cheers
Rob



On Fri, 20 May 2016 at 05:17 Bill Roberts <bill@swirrl.com<mailto:bill@swirrl.com>> wrote:
Hi Rob - many thanks for your comments.  A few initial responses inline below, though it would be good to have a chance to these over at some point.  This mail is in a thread discussing the coverage work, but many of your comments are probably more general and relate to the spatial data best practices.  My comments below are mostly with a coverage hat on.


On 19 May 2016, at 00:11, Rob Atkinson <rob@metalinkage.com.au<mailto:rob@metalinkage.com.au>> wrote:


I would point out that this topic, and this thread, exhibit a wide range of overlapping concerns, that would lead a new user to the field to a difficult challenge in how to address whcihever of these concerns they face. That, in a nutshell, is why we need a BP IMHO.,

Looking at the SDW charter we see:
"The Recommendation will include provision for describing the subset of coverages that are simple timeseries datasets - where a time-varying property is measured at a fixed location. OGC's WaterML<http://www.opengeospatial.org/standards/waterml> 2 Part 1 - Timeseries will be used as an initial basis.

Given that coverage data can often be extremely large in size, publication of the individual data points as Linked Data may not always be appropriate. The Recommendation will include provision for describing an entire coverage dataset and subsets thereof published in more compact formats using Linked Data. For example where a third party wishes to annotate a subset of a large coverage dataset or a data provider wishes to publish a large coverage dataset in smaller subsets to support convenient reuse."

As I understand it, and I don’t have a history of immersion in the jargon of coverages, in principle 'coverage' can refer to a wide range of data structures with some spatial component.  However the use cases that we came up with and assigned to the 'coverage' strand relate mostly to gridded data, whether satellite images, or 2D or 3D model results etc.  That doesn’t mean we’re not interested in other kinds of data, but we have to decide where to concentrate our limited resources.


In particular I would not limit the concept of spatial to coordinate geometry. IMHO need BP that allow us to describe coordinates, tesselations and other forms of grids with hierarchies, and hierarchies of nested features.  I would look for a BP that allowed the class of dimension to be easily recognised when defining the specific dimension of  a specific datacube, and the domain and range of the dimension to be easily accessed - i,e, definitions and values. I would also be hoping that there was a model that sould allow description of transformations between these, but I'd be happy merely for the mechanism for doing this to be identified.  Such a BP would allow me to characterise a s set of resources in a useful way, and start to work on the next steps.

A lot of my work outside of this working group relates to statistical data, generally referred to a tesselation of the country, i.e.a collection of administrative or statistical areas.  The existing RDF Data Cube is pretty well suited to that.  Something that is not defined in detail in the RDF Data Cube spec is describing hierarchical or other relationships between the values of the spatial dimension, but we certainly find that to be a common requirement, for example to enable aggregation of data from small areas to larger areas - and solve it by documenting and exploiting relationships between different values of the spatial dimension (eg the list of all areas of type X, that fall within larger area Y)

This kind of thing is indeed addressed in the draft best practices work - I have an action to write something on use of statistical data as part of our 'best practices narrative' - a realistic but imagined scenario based around a flooding incident, that involves application of the various strands of best practices work.


The same applies to the time dimension, and the sensor.

Each is complex, but some basic patterns for common cases would help, and provide the potential for future extension to describe how to process those dimensions.

The characterisation of relationships between slices/dices/branes/queries/traversals/derivations etc is also very complex. My own predilection here would be to get the basics of QB dimension description done, and then provide some informative examples of how these may then be used in the context of formally describing a subset.  A future BP effort could then be applied to sorting out all the patterns.

What do you think is missing from the existing definitions of qb:DimensionProperty and the 'data structure definition' approach in RDF data cube? (that’s a genuine question, not an assertion that it’s already perfect!)  By the way, in a separate initiative associated with an EU funded research project around statistical linked data (http://www.opengovintelligence.eu) , we’re gathering feedback on how people use RDF Data Cube in practice and their experience/preferences on how to apply the various features of the ontology.  If you’d like to feed into that, please let me know!


SDMX is not the only possible scope, but at least we know there are a set of use cases it does handle. And as Kerry points out, there are things we need that it doesnt handle (yet). We know these extensions are complicated enough a BP is required.

Each BP pattern would have a limited scope and an example of how it could applied in a specific circumstance, and a recomendation for further BP scope.

The work on the best practices document is broadly speaking following that kind of approach.  If you have some new use cases covering things that are not listed in the existing Use Cases and Requirements doc (UCR), it would be useful to document those. As I understand it the work on the UCR is still open to extensions.

We could prioritise these requirements in different ways:
1) identifying dependencies - e.g. you can't do slices on geography without having a pattern for the type of geography (slices on coordinates are different to slices on some tesselation are different to slices on feature geographies). You prioritise from the top of the tree down until you can offer something useful
2) vote (which probably means taking your favourite end-use case and waling back up the dependency tree if one is smart )
3) see who works on what and argue about the overlaps.
4) plans D, E, etc

My vote would be:
1) characterise the main types of spatial dimension in coverages and state what is in scope and what is out of scope for BP
2) create abstract classes for the in-scope types (on ontology module) and accept or reject RDF-QB as a basis for this
3) ditto for time
4) ask SSN group to review patterns and propose scope for sensors (scope may be "no BP recommended" or "future work on BP recommended")
5) characterise the main types of spatial and temporal dimension subsetting and state what is in scope and what is out of scope for BP
6) create abstract classes for the in-scope subsetting types (as ontology module), based on the abstract dimensions and accept or reject RDF-QB as a basis for this
7) develop informative examples of how these ontologies may be used to create links and provide enhanced context for accessing services via more complex protocols

The JSON and RDF data encoding work could then use the vocabularies defined in these ontologies to improve the consistency self-description.

rob






On Thu, 19 May 2016 at 02:48 Kerry Taylor <kerry.taylor@anu.edu.au<mailto:kerry.taylor@anu.edu.au>> wrote:

From: Kerry Taylor
Sent: Thursday, 19 May 2016 12:50 AM
To: 'Little, Chris' <chris.little@metoffice.gov.uk<mailto:chris.little@metoffice.gov.uk>>
Subject: RE: Coverage subgroup - document for discussion

Chris,
I think what you want was in the datacube model in an early draft --- a “subslice” but did not make it through to the final version.  I used it in a climate data project. However, relying on the beautiful W3C processes – you might be able to understand the why from here https://www.w3.org/2011/gld/track/issues/34. I think we need something like this for the coverage deliverable if we do go ahead with a qb model.

As I get it, the main reason for dropping it is that it is not defined in SDMX (the stats agency standard) and also (Dave Reynolds)
> The use case you mention is perfectly reasonable, but it can be addressed by a property in an extension namespace, and can be easily re-added in a future version.

So maybe that is something we should do? It makes sense to me.  Surely it can be used to define appropriately granular “extracts” even outside the original data publication. Does this do what you want? I’m not well enough aware of the relevant WCS2.0 capability.

--Kerry
From: Little, Chris [mailto:chris.little@metoffice.gov.uk]
Sent: Wednesday, 18 May 2016 11:49 PM
To: Rob Atkinson <rob@metalinkage.com.au<mailto:rob@metalinkage.com.au>>; Bill Roberts <bill@swirrl.com<mailto:bill@swirrl.com>>; Jon Blower <j.d.blower@reading.ac.uk<mailto:j.d.blower@reading.ac.uk>>
Cc: public-sdw-wg@w3.org<mailto:public-sdw-wg@w3.org>; Roger Brackin <roger.brackin@envitia.com<mailto:roger.brackin@envitia.com>>
Subject: RE: Coverage subgroup - document for discussion

Hi Rob,

This is a bit of a diversion and probably does not help finish this SDW WG  topic, but is a direction I want to go:

The QB model of dimensions and slices stops short of what is in OGC WCS2.0 – where any slice can be trimmed (a form of sub-setting) to a bounding box aligned with the dimension axes. So far, so what.

I am interested in the wholesale tiling of a data cube, as a one-off process, to enable a wider range of sub-setting and supporting scalability and reuse (if each tile given a persistent enough id). This is not really anything new, and some would argue is only an implementation detail. I am still interested. The tiles may not contain just single values from a simple scalar data cube, but may contain point clouds, vector geometry or other stuff – whatever the contents of the original data cube were.

There are a variety of applicable uses cases, such as archive granule retrieval, data dissemination to a very large number of low powered devices, boundary conditions for a large number of local weather prediction models.

Whether the tiles are treated as a single multi-dimensional coverage or a collection of a large number of lower dimensional coverages, I do not mind, but it seems to me that this a simple and straightforward addition to the QB model.

Is it?

Chris
From: Rob Atkinson [mailto:rob@metalinkage.com.au]
Sent: Wednesday, May 18, 2016 1:50 PM
To: Bill Roberts; Jon Blower
Cc: public-sdw-wg@w3.org<mailto:public-sdw-wg@w3.org>
Subject: Re: Coverage subgroup - document for discussion

Hi,
I've put some detail on the page https://www.w3.org/2015/spatial/wiki/Data_cube_for_coverage to identity different possible directions for this aspect.

FYI My project with OGC is concerned with UC1 and UC2, which seems complementary to the other activites supporting this thread.

Cheers
Rob Atkinson


On Wed, 18 May 2016 at 20:45 Bill Roberts <bill@swirrl.com<mailto:bill@swirrl.com>> wrote:
Thanks Jon, that's a useful perspective.  Certainly we talk about making discovery and retrieval of the data easier, working nicely with web-based technology etc - so we need to be clear about 'easier for whom'.  Inevitably different people will want different things so we will have to be explicit about our priorities.

The existing use cases cover quite a few of the scenarios you have sketched out, but they don't yet link those to these kind of user personas.  That might be worth doing - it probably wouldn't take long.




On 18 May 2016 at 11:35, Jon Blower <j.d.blower@reading.ac.uk<mailto:j.d.blower@reading.ac.uk>> wrote:
Hi Bill, all,

Just some initial thoughts in advance of our telecon. There is lots of good stuff in here, and it’s all relevant to the general area of “Coverages”. Some of these issues are of course very complex and I don’t think we’ll solve them all – and in fact this group might not be the best place to do so.

I wonder if it would help to structure the document and our thinking around the different audiences we might aim at. For example:

 * A “web developer” might need some explanation of what a coverage is (“dummies’ guide”). He/she would probably like a simple API to access them, and some simple formats with which he/she is familiar. The applications are likely to be reasonable simple and visualisation-oriented, rather than “deep” analysis.

 * A “spatial data publisher” might already be familiar with the terminology, but might want to know how to make his/her data more discoverable by mass-market search engines, or how best to make use of Linked Data and semantic stuff. He/she is probably going to be keen to describe coverage data very precisely (e.g. using the “right” CRS and full-res geometries), but is also interested in the cost/benefit tradeoff.

 * A “data analyst/scientist” might be interested in quality and uncertainty, and how to bring coverage data into his/her tools (e.g. GIS, Python scripts). This kind of person may just want to download the data files in an unmodified form, although data-extraction services can be useful in some circumstances (and hosted processing is increasingly popular).

 * An “environmental consultant” may have very limited time to perform some kind of analysis to form part of a report. If a dataset is hard to find, access or understand it will probably simply be omitted from the analysis. Often interested in a very specific geographic area. Needs to quickly establish that a dataset is trustworthy,

 * An “IT provider” might be interested in scalable and maintainable web services for high-volume data that can be made part of his/her organisation’s operational procedures. He/she probably has a low tolerance for high-complexity or “bleeding edge” technology.

This is just off the top of my head, and there are certainly more, and there will also be lots of overlap. And I’m sure there’s lots to argue about there. But this helps me, at least, put some structure on the Big List. For each of these kinds of user, what would be the most useful thing that we could do to help them (maybe a new technology, or a recommendation to use something existing, or an admission that the problem remains unsolved), in the context of this group?

(Am I just reinventing the Use Cases here, or is this still useful for the Coverage requirements?)

Cheers,
Jon



From: Bill Roberts <bill@swirrl.com<mailto:bill@swirrl.com>>
Date: Tuesday, 17 May 2016 23:44
To: "public-sdw-wg@w3.org<mailto:public-sdw-wg@w3.org>" <public-sdw-wg@w3.org<mailto:public-sdw-wg@w3.org>>
Subject: Coverage subgroup - document for discussion
Resent-From: <public-sdw-wg@w3.org<mailto:public-sdw-wg@w3.org>>
Resent-Date: Tuesday, 17 May 2016 23:44

Hi all

I've made some initial notes on requirements in this wiki page:

https://www.w3.org/2015/spatial/wiki/Coverage_draft_requirements


I'd like to go through this on the call tomorrow (we probably won't get all the way through it as there is quite a lot there).  If you are joining the call it would be great if you could look at it in advance.

Comments also welcome via this mailing list.

Cheers

Bill





--

Dr. Peter Baumann

 - Professor of Computer Science, Jacobs University Bremen

   www.faculty.jacobs-university.de/pbaumann<http://www.faculty.jacobs-university.de/pbaumann>

   mail: p.baumann@jacobs-university.de<mailto:p.baumann@jacobs-university.de>

   tel: +49-421-200-3178, fax: +49-421-200-493178

 - Executive Director, rasdaman GmbH Bremen (HRB 26793)

   www.rasdaman.com<http://www.rasdaman.com>, mail: baumann@rasdaman.com<mailto:baumann@rasdaman.com>

   tel: 0800-rasdaman, fax: 0800-rasdafax, mobile: +49-173-5837882

"Si forte in alienas manus oberraverit hec peregrina epistola incertis ventis dimissa, sed Deo commendata, precamur ut ei reddatur cui soli destinata, nec preripiat quisquam non sibi parata." (mail disclaimer, AD 1083)

Received on Monday, 23 May 2016 06:35:57 UTC