Re: AW: [QB] ISSUE-33 Discussion and possible resolutions from Dave Reynolds on 2013-02-28 (public-gld-wg@w3.org from February 2013)

From: Dave Reynolds <dave.e.reynolds@gmail.com>
Date: Thu, 28 Feb 2013 17:05:05 +0000
To: Benedikt Kaempgen <kaempgen@fzi.de>
CC: Government Linked Data Working Group <public-gld-wg@w3.org>
Message-ID: <512F8E41.6000400@gmail.com>
Hi Benedikt,

On 28/02/13 16:28, Benedikt Kaempgen wrote:
> Hi,
>
>> 2. Allow. A qb:Slice is simply a collection of qb:Observations which are
>> grouped together to aid data consumers. If a qb:Slice has a qb:SliceKey
>> then all observations on given slice should have the same value for
>> every fixed dimension. However, a qb:Slice may be used to represent
>> other collections of observations and in those cases lack a qb:SliceKey.
> I am also in favour of this option.

Good, that's two votes :)

> Can we clarify slice's usage in the case of adding aggregations of datasets?

Maybe ... there's limits to how much we can say about this given the 
proposal to postpone ISSUE-30.

> Assume we have the following dataset:
>
> <dataset> qb:structure <datastructuredefinition>.
>
> <observation1> a qb:Observation.
> <observation1> lo_date 2010.
> <observation1> lo_customer <customer1>.
> <observation1> sdmx-measure:obsValue 20.
> <observation1> qb:dataSet <dataset>.
>
> <observation2> a qb:Observation.
> <observation2> lo_date 2010.
> <observation2> lo_customer <customer2>.
> <observation2> sdmx-measure:obsValue 10.
> <observation2> qb:dataSet <dataset>.
>
> ..and we would like to also publish aggregations from it, e.g.,
>
> <observation1and2> a qb:Observation.
> <observation1and2> lo_date 2010.
> <observation1and2> lo_customer <customerALL>.
> <observation1and2> sdmx-measure:obsValue 30.

Bear in mind that normally in SDMX and thus Data Cube one would do this 
with a hierarchy. I guess in this example that would mean having each 
customer as skos:narrower than <customerALL>, though the proposal for 
ISSUE-31 would allow other properties to be used.

> If <customerALL> would not be used in observations of <dataset>, this would create observations for a more arbitrary slice. In Data Warehousing scenarios, such a slice would often correspond to a "view" on the original dataset.
>
> Would we then recommend to:
>
> 1) Use the same dataset, i.e., add
>
> <observation1and2> qb:dataSet <dataset>.
>
> 2) Create a slice and use the same dataset, i.e., add
>
> <dataset> qb:slice <customerALLslice>.
> <customerALLslice> a qb:Slice.
> <customerALLslice> qb:observation <observation1and2>.
> <observation1and2> qb:dataSet <dataset>.
>
> 2) Create a slice and introduce a new dataset for aggregated observations, i.e., add
>
> <dataset> qb:slice <customerALLslice>.
> <customerALLslice> a qb:Slice.
> <customerALLslice> qb:observation <observation1and2>.
> <observation1and2> qb:dataSet <customerALLslicedataset>.
> <customerALLslicedataset> qb:structure <datastructuredefinition>.
>
> The slice would not necessarily have a qb:SliceKey since it would be one of the more flexible usages of a slice (view).

If I'm understanding this then I think it is all legal but not sure if 
we want to go as far as explicitly recommending it in the current 
version of the spec. It raises the issue "how do you know that's an 
aggregation and how it has been aggregated" - which we are proposing to 
postpone.

Dave

> ________________________________________
> Von: Dave Reynolds [dave.e.reynolds@gmail.com]
> Gesendet: Mittwoch, 27. Februar 2013 19:02
> An: Government Linked Data Working Group
> Betreff: [QB] ISSUE-33 Discussion and possible resolutions
>
> ISSUE-33 [1] is about generalizing qb:slice.
>
> As it says in the current spec:
>
> """Slices allow us to group subsets of observations together. This not
> intended to represent arbitrary selections from the observations but
> uniform slices through the cube in which one or more of the dimension
> values are fixed."""
>
> Slices confer three benefits:
>    1. guides consuming applications in how to present the data
>    2. provides an identifier (e.g. for external annotation)
>    3. allows for a less bulk, abbreviated, format
>
> Thus normal usage for qb:slice is that each slice is associated with a
> qb:SliceKey which in turn lists those dimensions that are fixed in the
> corresponding qb:slices. The vocabulary does not currently include any
> formal OWL restrictions that require this but the spec should formalize
> it one way or another as part of addressing ISSUE-29.
>
> When using Data Cube for measurement data, such as environmental
> information, we have found it to be useful to also have collections of
> observations which don't correspond to a specific qb:SliceKey but
> represent some hard-to-compute view across the data which is useful for
> presentational purposes (benefit #1). For example, in the Environment
> Agency publication of Bathing Water quality information there are sets
> representing the latest available value for all Bathing Waters [2]. So
> there is one value for each location but the time dimension is an
> implicit "latest available" rather than an explicit specific time.[3]
> That data uses sub classes of qb:Slice to represent such collections and
> so uses qb:observation to link to the observations themselves.
>
> So the issue here is whether we sanction and support such use cases or not.
>
> I see N possible approaches.
>
> 1.a Reject. A data cube is well formed only if there is a qb:SliceKey
> for each qb:Slice and if each observation within the qb:Slice has the
> same value for each fixed dimension, which can be attached to the
> qb:Slice in abbreviated mode.  An application is free to invent a new
> class to represent arbitrary collections of observations but cannot use
> qb:observation to link to them.
>
> 1.b As 1.a but generalize qb:observation to be open domain so that it
> could be reused in such circumstances.
>
> 1.c As 1.a but provide a qb:Collection class to use for such purposes
> and make the domain of qb:observation be the union of qb:Slice and
> qb:Collection (or open).
>
> 2. Allow. A qb:Slice is simply a collection of qb:Observations which are
> grouped together to aid data consumers. If a qb:Slice has a qb:SliceKey
> then all observations on given slice should have the same value for
> every fixed dimension. However, a qb:Slice may be used to represent
> other collections of observations and in those cases lack a qb:SliceKey.
>
> My preference is for 2, though I have a conflict of interest here.[4]
>
> If that is a flexibility too far then I would prefer 1.c or 1.b so that
> existing deployments that use qb:observation this way can continue to at
> least use that predicate in this way.
>
> Comments?
>
> Dave
>
> [1] http://www.w3.org/2011/gld/track/issues/33
> (oops, should have put issue links in my earlier messages, too late)
>
> [2]
> http://environment.data.gov.uk/data/bathing-water-quality/in-season/slice/latest
>
> [3] The issue is not about how non-monotonic things can represented in
> RDF, don't let that aspect of this example divert from the QB question.
>
> [4] While it was not me that did that particular modelling it is my
> company that publishes the Environment Agency data.
>
Received on Thursday, 28 February 2013 17:05:38 UTC