[QB] ISSUE-33 Discussion and possible resolutions from Dave Reynolds on 2013-02-27 (public-gld-wg@w3.org from February 2013)

From: Dave Reynolds <dave.e.reynolds@gmail.com>
Date: Wed, 27 Feb 2013 18:02:57 +0000
To: Government Linked Data Working Group <public-gld-wg@w3.org>
Message-ID: <512E4A51.3080200@gmail.com>
ISSUE-33 [1] is about generalizing qb:slice.

As it says in the current spec:

"""Slices allow us to group subsets of observations together. This not 
intended to represent arbitrary selections from the observations but 
uniform slices through the cube in which one or more of the dimension 
values are fixed."""

Slices confer three benefits:
  1. guides consuming applications in how to present the data
  2. provides an identifier (e.g. for external annotation)
  3. allows for a less bulk, abbreviated, format

Thus normal usage for qb:slice is that each slice is associated with a 
qb:SliceKey which in turn lists those dimensions that are fixed in the 
corresponding qb:slices. The vocabulary does not currently include any 
formal OWL restrictions that require this but the spec should formalize 
it one way or another as part of addressing ISSUE-29.

When using Data Cube for measurement data, such as environmental 
information, we have found it to be useful to also have collections of 
observations which don't correspond to a specific qb:SliceKey but 
represent some hard-to-compute view across the data which is useful for 
presentational purposes (benefit #1). For example, in the Environment 
Agency publication of Bathing Water quality information there are sets 
representing the latest available value for all Bathing Waters [2]. So 
there is one value for each location but the time dimension is an 
implicit "latest available" rather than an explicit specific time.[3] 
That data uses sub classes of qb:Slice to represent such collections and 
so uses qb:observation to link to the observations themselves.

So the issue here is whether we sanction and support such use cases or not.

I see N possible approaches.

1.a Reject. A data cube is well formed only if there is a qb:SliceKey 
for each qb:Slice and if each observation within the qb:Slice has the 
same value for each fixed dimension, which can be attached to the 
qb:Slice in abbreviated mode.  An application is free to invent a new 
class to represent arbitrary collections of observations but cannot use 
qb:observation to link to them.

1.b As 1.a but generalize qb:observation to be open domain so that it 
could be reused in such circumstances.

1.c As 1.a but provide a qb:Collection class to use for such purposes 
and make the domain of qb:observation be the union of qb:Slice and 
qb:Collection (or open).

2. Allow. A qb:Slice is simply a collection of qb:Observations which are 
grouped together to aid data consumers. If a qb:Slice has a qb:SliceKey 
then all observations on given slice should have the same value for 
every fixed dimension. However, a qb:Slice may be used to represent 
other collections of observations and in those cases lack a qb:SliceKey.

My preference is for 2, though I have a conflict of interest here.[4]

If that is a flexibility too far then I would prefer 1.c or 1.b so that 
existing deployments that use qb:observation this way can continue to at 
least use that predicate in this way.

Comments?

Dave

[1] http://www.w3.org/2011/gld/track/issues/33
(oops, should have put issue links in my earlier messages, too late)

[2] 
http://environment.data.gov.uk/data/bathing-water-quality/in-season/slice/latest

[3] The issue is not about how non-monotonic things can represented in 
RDF, don't let that aspect of this example divert from the QB question.

[4] While it was not me that did that particular modelling it is my 
company that publishes the Environment Agency data.
Received on Wednesday, 27 February 2013 18:03:27 UTC