AW: [QB] ISSUE-33 Discussion and possible resolutions from Benedikt Kaempgen on 2013-02-28 (public-gld-wg@w3.org from February 2013)

From: Benedikt Kaempgen <kaempgen@fzi.de>
Date: Thu, 28 Feb 2013 16:28:36 +0000
To: Dave Reynolds <dave.e.reynolds@gmail.com>, "Government Linked Data Working Group" <public-gld-wg@w3.org>
Message-ID: <0D7BFFD7C415144DA75C3D49C46AC21512AB2FDF@ex-ms-1a.fzi.de>
Hi,

>2. Allow. A qb:Slice is simply a collection of qb:Observations which are
>grouped together to aid data consumers. If a qb:Slice has a qb:SliceKey
>then all observations on given slice should have the same value for
>every fixed dimension. However, a qb:Slice may be used to represent
>other collections of observations and in those cases lack a qb:SliceKey.
I am also in favour of this option.

Can we clarify slice's usage in the case of adding aggregations of datasets?

Assume we have the following dataset:

<dataset> qb:structure <datastructuredefinition>.

<observation1> a qb:Observation.
<observation1> lo_date 2010.
<observation1> lo_customer <customer1>.
<observation1> sdmx-measure:obsValue 20.
<observation1> qb:dataSet <dataset>.

<observation2> a qb:Observation.
<observation2> lo_date 2010.
<observation2> lo_customer <customer2>.
<observation2> sdmx-measure:obsValue 10.
<observation2> qb:dataSet <dataset>.

..and we would like to also publish aggregations from it, e.g.,

<observation1and2> a qb:Observation.
<observation1and2> lo_date 2010.
<observation1and2> lo_customer <customerALL>.
<observation1and2> sdmx-measure:obsValue 30.

If <customerALL> would not be used in observations of <dataset>, this would create observations for a more arbitrary slice. In Data Warehousing scenarios, such a slice would often correspond to a "view" on the original dataset.

Would we then recommend to:

1) Use the same dataset, i.e., add

<observation1and2> qb:dataSet <dataset>.

2) Create a slice and use the same dataset, i.e., add

<dataset> qb:slice <customerALLslice>.
<customerALLslice> a qb:Slice.
<customerALLslice> qb:observation <observation1and2>. 
<observation1and2> qb:dataSet <dataset>. 

2) Create a slice and introduce a new dataset for aggregated observations, i.e., add

<dataset> qb:slice <customerALLslice>.
<customerALLslice> a qb:Slice.
<customerALLslice> qb:observation <observation1and2>. 
<observation1and2> qb:dataSet <customerALLslicedataset>. 
<customerALLslicedataset> qb:structure <datastructuredefinition>.

The slice would not necessarily have a qb:SliceKey since it would be one of the more flexible usages of a slice (view).

Best,

Benedikt

________________________________________
Von: Dave Reynolds [dave.e.reynolds@gmail.com]
Gesendet: Mittwoch, 27. Februar 2013 19:02
An: Government Linked Data Working Group
Betreff: [QB] ISSUE-33 Discussion and possible resolutions

ISSUE-33 [1] is about generalizing qb:slice.

As it says in the current spec:

"""Slices allow us to group subsets of observations together. This not
intended to represent arbitrary selections from the observations but
uniform slices through the cube in which one or more of the dimension
values are fixed."""

Slices confer three benefits:
  1. guides consuming applications in how to present the data
  2. provides an identifier (e.g. for external annotation)
  3. allows for a less bulk, abbreviated, format

Thus normal usage for qb:slice is that each slice is associated with a
qb:SliceKey which in turn lists those dimensions that are fixed in the
corresponding qb:slices. The vocabulary does not currently include any
formal OWL restrictions that require this but the spec should formalize
it one way or another as part of addressing ISSUE-29.

When using Data Cube for measurement data, such as environmental
information, we have found it to be useful to also have collections of
observations which don't correspond to a specific qb:SliceKey but
represent some hard-to-compute view across the data which is useful for
presentational purposes (benefit #1). For example, in the Environment
Agency publication of Bathing Water quality information there are sets
representing the latest available value for all Bathing Waters [2]. So
there is one value for each location but the time dimension is an
implicit "latest available" rather than an explicit specific time.[3]
That data uses sub classes of qb:Slice to represent such collections and
so uses qb:observation to link to the observations themselves.

So the issue here is whether we sanction and support such use cases or not.

I see N possible approaches.

1.a Reject. A data cube is well formed only if there is a qb:SliceKey
for each qb:Slice and if each observation within the qb:Slice has the
same value for each fixed dimension, which can be attached to the
qb:Slice in abbreviated mode.  An application is free to invent a new
class to represent arbitrary collections of observations but cannot use
qb:observation to link to them.

1.b As 1.a but generalize qb:observation to be open domain so that it
could be reused in such circumstances.

1.c As 1.a but provide a qb:Collection class to use for such purposes
and make the domain of qb:observation be the union of qb:Slice and
qb:Collection (or open).

2. Allow. A qb:Slice is simply a collection of qb:Observations which are
grouped together to aid data consumers. If a qb:Slice has a qb:SliceKey
then all observations on given slice should have the same value for
every fixed dimension. However, a qb:Slice may be used to represent
other collections of observations and in those cases lack a qb:SliceKey.

My preference is for 2, though I have a conflict of interest here.[4]

If that is a flexibility too far then I would prefer 1.c or 1.b so that
existing deployments that use qb:observation this way can continue to at
least use that predicate in this way.

Comments?

Dave

[1] http://www.w3.org/2011/gld/track/issues/33
(oops, should have put issue links in my earlier messages, too late)

[2]
http://environment.data.gov.uk/data/bathing-water-quality/in-season/slice/latest

[3] The issue is not about how non-monotonic things can represented in
RDF, don't let that aspect of this example divert from the QB question.

[4] While it was not me that did that particular modelling it is my
company that publishes the Environment Agency data.
Received on Thursday, 28 February 2013 16:29:05 UTC