Re: QB Data Cube Dicing. Was: Coverage subgroup update from Bill Roberts on 2016-07-21 (public-sdw-wg@w3.org from July 2016)

From: Bill Roberts <bill@swirrl.com>
Date: Thu, 21 Jul 2016 10:50:17 +0100
To: "Little, Chris" <chris.little@metoffice.gov.uk>
Cc: "public-sdw-wg@w3.org" <public-sdw-wg@w3.org>
Message-ID: <CAMTVsumYFWmEJ5wVCdT7ELXZjjjmLEfhTBHx2UOQ0=PwoYfnrg@mail.gmail.com>
Hi Chris

This certainly sounds interesting to me and relevant.

As you say, the RDF Data Cube vocabulary has largely come out of the
statistical community and draws heavily on SDMX.  That means that in most
cases QB dimensions are lists of things or concepts - importantly for
tiling, these typically have no defined order, hence the limitation of
formal 'subsetting' in QB to slices.

A common example of a dimension might be breaking down a population by
ethnicity.  The spatial variation is most commonly represented as a one
dimensional unordered list of area URIs (countries, or administrative
districts or whatever).  The values of a time dimension are most often
interval URIs - which are orderable if the start and end of the intervals
are defined, but not through any standard mechanism of QB.

So to enable dicing with QB as it currently stands would need the subset of
the possible values of a dimension to be listed.  A useful extension to QB
might therefore be to look at standard approaches to defining order and
orderable dimensions.

>From Swirrl's work on statistical data, we've found various situations
around data presentation in a web page, where being able to order things
properly would be useful, so even if the main motivation for looking at
this was to do dicing/tiling, there could be some handy spin-offs for the
kind of data already well served by QB.

We've made some of our own workarounds, for example adding ui:sortPriority
triples to concepts in a concept scheme, to allow a consistent and specific
ordering of dimension values to be defined when needed, eg when the
statisticians have some convention for how a particular quantity should be
presented in a table or whatever.

I'd be willing to contribute to this, but just in terms of workload, I
think I'd have to look to others (you?) to lead and push it.

Cheers

Bill




On 20 July 2016 at 18:31, Little, Chris <chris.little@metoffice.gov.uk>
wrote:

> Rob, Jon, Simon, Josh, Bill and colleagues,
>
>
>
> Apologies for spinning off another thread, but this seems a good time and
> place. Kick me well into touch if you wish.
>
>
>
> I have been interested in sub-setting data cubes, as a potentially
> scalable, sustainable approach to supporting large numbers of users/clients
> on lightweight devices. Think generalisation of map tiles to:
>
> a)      Point clouds, vectors, 3D geometries;
>
> b)      N dimensional map tiles, including non-spatial and non-temporal
> dimensions;
>
> c)       Pokemon-Go-Cov;
>
> d)      The WindAR proof of concept from me, Mike Reynolds and Christine
> Perey a couple of years ago;
>
> e)      RDF QB model ‘diced’ as well as ‘sliced’
>
> f)       Etc.
>
>
>
> I thought that the QB model would have enough generality but was
> disappointed to find slices only (but pleased at the simplicity, rigour and
> generality). There was a move in W3C to have some more granularity, but In
> understand that that was driven by the statistical spreadsheet ISO people
> in the direction of pivot tables and temporal summaries, and quite rightly
> failed.
>
>
>
> I would like to increase the generality in the direction of dicing as I
> said. For example, having sliced an n-D cube across a dimension to obtain
> an (n-1)-D cube, it could be still too big, so tile it/pre-format/dice once
> at server side. Map tile sets are the traditional example.
>
>
>
> I think and hope we should be able to rattle of a reasonably good
> extension of QB as a general (non-spatial) concept, and then produce some
> convincing use cases or examples, including spatial and temporal, to make
> it worthwhile.
>
>
>
> Roger Brackin and I failed miserably to get much traction with an OGC SWG
> last year, but I now see many more implementations coercing map tiles, in
> both 2-D and 3-D, for rasters, point clouds, vectors, geometry and more, to
> disseminate or give access to big data. Of course, many Met Ocean use cases
> are for n-D gridded data, where n is 3,4,5,6, …, etc.
>
>
>
> So what do you think?
>
>
>
> Chris
>
>
>
>
>
Received on Thursday, 21 July 2016 09:50:47 UTC