Re: QB Data Cube Dicing. Was: Coverage subgroup update from Rob Atkinson on 2016-07-21 (public-sdw-wg@w3.org from July 2016)

From: Rob Atkinson <rob@metalinkage.com.au>
Date: Thu, 21 Jul 2016 12:45:32 +0000
To: "Little, Chris" <chris.little@metoffice.gov.uk>, Bill Roberts <bill@swirrl.com>
Cc: "Hedley, Mark" <mark.hedley@metoffice.gov.uk>, Simon Cox <Simon.Cox@csiro.au>, "public-sdw-wg@w3.org" <public-sdw-wg@w3.org>
Message-ID: <CACfF9LzR_pR7GRMhE8PBuOvVjzzB_378NSAc75c4ULThz0D0FQ@mail.gmail.com>
I'm not sure we need to define anything new here, unless we want to create
some new formal axioms about classes of dimensions that are ordered ?

 extracted from the QB spec:

"A component property encapsulates several pieces of information:

   - the concept being represented (e.g. time or geographic area),
   - the nature of the component (dimension, attribute or measure) as
   represented by the type of the component property,
   - the type or code list used to represent the value.

The same *concept* can be manifested in different components. For example,
the concept of *currency* may be used as a dimension (in a data set dealing
with exchange rates) or as an attribute (when describing the currency in
which an observed trade took place). The concept of time is typically used
only as a dimension but may be encoded as a data value (e.g. an xsd:dateTime)
or as a symbolic value (e.g. a URI drawn from the reference time URI set
developed by data.gov.uk). In statistical agencies it is common to have a
standard thesaurus of statistical concepts which underpin the components
used in multiple different data sets.

To support this reuse of general statistical concepts the data cube
vocabulary provides the qb:concept
<https://www.w3.org/TR/vocab-data-cube/#dfn-qb-concept> property which
links a qb:ComponentProperty
<https://www.w3.org/TR/vocab-data-cube/#dfn-qb-componentproperty-1> to the
concept it represents. We use the SKOS vocabulary [SKOS-PRIMER
<https://www.w3.org/TR/vocab-data-cube/#bib-SKOS-PRIMER>] to represent such
concepts. "


and an example:

eg:refPeriod  a rdf:Property, qb:DimensionProperty;
    rdfs:label "reference period"@en;
    rdfs:subPropertyOf sdmx-dimension:refPeriod;
    rdfs:range interval:Interval;
    qb:concept sdmx-concept:refPeriod

>From my reading the data type specifed by the rdfs:range property of a
dimension tells a client if its ordered or not - and dereferencing the URI
to get an OWL model (using conneg) seems a basic requirement - all we need
to do IMHO is point out how this should be done and hence used.

The SKOS binding for qb:concept allows us to have more semantics about the
meaning of the dimension above and beyond its data type which infers the
operations it supports.

Do we need more than this, structurally?  Having some recommended data
types for common temporal concepts based on the Time ontology seems a good
BP layer on top.

to me this all seems straightforward enough

I'm more interested in the following challenges:

1) how to describe coordinates - is this a set of dimensions that share a
common range data type - or a single dimension? Do we need to show usage or
define a class or

2) how to describe subsetting on a set of dimensions, relative to the
dimensions of the original dataset?

3) how to handle N-tree style nested spatial dimensions - such as DGGS,
 What3words etc. Here is another I think some superclass of such dimensions
probably needs defining, or at least the data type that allows a rdfs:range
to be interpreted.

Rob





On Thu, 21 Jul 2016 at 21:16 Little, Chris <chris.little@metoffice.gov.uk>
wrote:

> Bill,
>
>
>
> Thanks for being positive!
>
>
>
> I suspect we are in safe territory, because the mathematics of ordered
> sets, and partially ordered sets, have been well understood for a century
> or so.
>
>
>
> We can probably do things with just < and > (and therefore just ≥).
>
>
>
> Actually, I suspect most ordering that we would require would sit very
> nicely with Allen’s temporal algebra.
>
>
>
> Chris
>
>
>
> *From:* Bill Roberts [mailto:bill@swirrl.com]
> *Sent:* Thursday, July 21, 2016 11:29 AM
> *To:* Little, Chris
> *Cc:* Simon Cox; public-sdw-wg@w3.org
>
>
> *Subject:* Re: QB Data Cube Dicing. Was: Coverage subgroup update
>
>
>
> Hi Chris
>
> Don't worry, I wasn't suggesting we should give up!  Yes, I think the
> solution would be to propose and define a new OrderedDimension class and to
> consider how to define how order would be determined.
>
> Clearly, for a spatial dimension such as latitude, that could be a simple
> numerical ordering.  And if a time dimension has values as xsd:dateTime
> then SPARQL is able to order them.  Maybe a more generic ordering approach
> could be useful for cases eg where the values of the time dimension are
> interval URIs.  Maybe we don't worry about that and restrict values of an
> OrderedDimension to be things that behave correctly with arithmetical >, =,
> <
>
> Bill
>
>
>
>
>
> On 21 July 2016 at 11:21, Little, Chris <chris.little@metoffice.gov.uk>
> wrote:
>
> Simon, Bill,
>
>  OK. Not yet giving up, getting claw hammer and pincers out to extract
> Simon’s well hit nail.
>
>  Maybe we need an OrderedQB, or at least an OrderedDimension concept?
>
>  Surely some QB dimensions do have order, such as time?
>
>  Then we just restrict the dicing to those dimensions that have an
> intrinsic ordering, or even an imposed arbitrary ordering?
>
>  An example of the latter would be what the OGC Met Ocean Domain WG did
> with the Best Practice for specifying a Web Map Services for weather
> forecast Ensembles? An ensemble of, say, 60 simultaneous forecasts has no
> inherent ordering as all 60 forecasts are, a priori, equally likely. So
> they form a set. A set can be partitioned. E.g. 12 subsets of 5 each, and
> these are arbitrary, but convenient for handling. They are labelled 1-60 or
> 0-59, albeit slightly misleadingly, in the Best Practice.
>
>  So maybe it is not quite as simple as first thought, but still not
> intractable?
>
>  Chris
>
>  *From:* Bill Roberts [mailto:bill@swirrl.com]
> *Sent:* Thursday, July 21, 2016 10:59 AM
> *To:* Simon Cox
> *Cc:* public-sdw-wg@w3.org
> *Subject:* Re: QB Data Cube Dicing. Was: Coverage subgroup update
>
>  Hi Simon - I sent my reply to Chris before reading your comment on
> ordering - yes you've hit the nail on the head.  QB dimensions are not in
> general ordered and there is currently no standard approach in QB for
> defining the order.
>
>  Cheers
>
>  Bill
>
>  On 21 July 2016 at 05:03, <Simon.Cox@csiro.au> wrote:
>
> In QB are the elements in a dimension always _ordered_? Dicing would
> require that I suppose.
>
>  *From:* Rob Atkinson [mailto:rob@metalinkage.com.au]
> *Sent:* Thursday, 21 July 2016 10:39 AM
> *To:* Cox, Simon (L&W, Clayton) <Simon.Cox@csiro.au>;
> chris.little@metoffice.gov.uk; j.d.blower@reading.ac.uk;
> jlieberman@tumblingwalls.com; bill@swirrl.com; public-sdw-wg@w3.org
> *Cc:* m.riechert@reading.ac.uk; roger.brackin@envitia.com;
> cperey@perey.com
> *Subject:* Re: QB Data Cube Dicing. Was: Coverage subgroup update
>
>  I also think there is a lot that can be done using out-of-the-box QB, +
> some other standards  - because dimensions can be specified against domain
> and range - and its possible to define virtual subsets against these.
>
>  For example - here is an approach to automate generate materialisation of
> those virtual subsets:  http://opencube-toolkit.eu/
>
>  So, _some_ relationships between subsets can be expressed based on the
> dimensions - for example a codedDimension could have subsets defined by the
> SKOS hierarchy of the bounded codelist.  specialisations of a
> codedDimension that restrict the range to a specific Concept get a
> declaration of how it relates to other dimensions through the existing
> standardised mechanism.
>
>  This works for free for nested Features and time codes.
>
>  If we had equivalent semantics for granules of spatial and temporal
> coordinate space and articulated a best practice here I suspect we might
> not need to extend QB at all - though its possible we might want to do so
> to promote some convenient inferences for example those that otherwise
> require OWL reasoning over the domain model.
>
>  Lets try to work up a hit lit of the most useful types of dimensions and
> the subsetting we would want to declare and see if we can express in
> vanilla QB first.
>
>  I'll create some stubs for a hierarchy from abstract to concrete best
> practices and have a go at what some might look like
>
>  i'll also offer to curate the list (as a proof of concept i'm setting up
> a registry of dimension specifications with view/profile based content
> negotiaion and reasoning support behind it, so that inferencing and
> querying over complex dimension specialisation chains can be done
> server-side to make it easy for clients).  We can decide if we want to make
> this a published resource, and/or an active registry, if it is deemed
> useful.
>
>  Rob
>
>  On Thu, 21 Jul 2016 at 09:27 <Simon.Cox@csiro.au> wrote:
>
> Ø  I think and hope we should be able to rattle of a reasonably good
> extension of QB as a general (non-spatial) concept, and then produce some
> convincing use cases or examples, including spatial and temporal, to make
> it worthwhile.
>
> +1
>
> This is exactly the direction to take it – a small extension to deal with
> a discrete issue.
>
> *From:* Little, Chris [mailto:chris.little@metoffice.gov.uk]
> *Sent:* Thursday, 21 July 2016 3:31 AM
> *To:* Jon Blower <j.d.blower@reading.ac.uk>; Cox, Simon (L&W, Clayton) <
> Simon.Cox@csiro.au>; Joshua Lieberman <jlieberman@tumblingwalls.com>;
> bill@swirrl.com; public-sdw-wg@w3.org
> *Cc:* m.riechert@reading.ac.uk; Roger Brackin <roger.brackin@envitia.com>;
> Christine Perey (cperey@perey.com) <cperey@perey.com>
> *Subject:* QB Data Cube Dicing. Was: Coverage subgroup update
>
>  Rob, Jon, Simon, Josh, Bill and colleagues,
>
>  Apologies for spinning off another thread, but this seems a good time and
> place. Kick me well into touch if you wish.
>
>  I have been interested in sub-setting data cubes, as a potentially
> scalable, sustainable approach to supporting large numbers of users/clients
> on lightweight devices. Think generalisation of map tiles to:
>
> a)      Point clouds, vectors, 3D geometries;
>
> b)      N dimensional map tiles, including non-spatial and non-temporal
> dimensions;
>
> c)       Pokemon-Go-Cov;
>
> d)      The WindAR proof of concept from me, Mike Reynolds and Christine
> Perey a couple of years ago;
>
> e)      RDF QB model ‘diced’ as well as ‘sliced’
>
> f)       Etc.
>
>  I thought that the QB model would have enough generality but was
> disappointed to find slices only (but pleased at the simplicity, rigour and
> generality). There was a move in W3C to have some more granularity, but In
> understand that that was driven by the statistical spreadsheet ISO people
> in the direction of pivot tables and temporal summaries, and quite rightly
> failed.
>
>  I would like to increase the generality in the direction of dicing as I
> said. For example, having sliced an n-D cube across a dimension to obtain
> an (n-1)-D cube, it could be still too big, so tile it/pre-format/dice once
> at server side. Map tile sets are the traditional example.
>
>  I think and hope we should be able to rattle of a reasonably good
> extension of QB as a general (non-spatial) concept, and then produce some
> convincing use cases or examples, including spatial and temporal, to make
> it worthwhile.
>
>  Roger Brackin and I failed miserably to get much traction with an OGC SWG
> last year, but I now see many more implementations coercing map tiles, in
> both 2-D and 3-D, for rasters, point clouds, vectors, geometry and more, to
> disseminate or give access to big data. Of course, many Met Ocean use cases
> are for n-D gridded data, where n is 3,4,5,6, …, etc.
>
>  So what do you think?
>
>  Chris
>
>  *From:* Jon Blower [mailto:j.d.blower@reading.ac.uk
> <j.d.blower@reading.ac.uk>]
> *Sent:* Wednesday, July 20, 2016 12:50 AM
> *To:* Simon.Cox@csiro.au; bill@swirrl.com; public-sdw-wg@w3.org
> *Cc:* m.riechert@reading.ac.uk
> *Subject:* Re: Coverage subgroup update
>
>  Hi Simon,
>
>  Ø  QB provides a data model that allows you to express sub-setting
> operations in SPARQL. That looks like a win to me. I.e. think of QB as an
> API, not a payload.
>
>  I’m not an expert in QB by any means, but I understand that the
> subsetting in QB essentially means taking a Slice (in their terminology),
> which is a rather limited kind of subset. I didn’t see a way of taking
> arbitrary subsets (e.g. by geographic coordinates) in the way that WCS
> could. Can you expand on this, perhaps giving some examples of different
> subset types that can be expressed in SPARQL using QB?
>
>  Cheers,
> Jon
>
>  *From: *"Simon.Cox@csiro.au" <Simon.Cox@csiro.au>
> *Date: *Wednesday, 20 July 2016 00:02
> *To: *"bill@swirrl.com" <bill@swirrl.com>, "public-sdw-wg@w3.org" <
> public-sdw-wg@w3.org>
> *Cc: *Maik Riechert <m.riechert@reading.ac.uk>, Jon Blower <
> sgs02jdb@reading.ac.uk>
> *Subject: *RE: Coverage subgroup update
>
>  Ø  The main potential drawback of the RDF Data Cube approach in this
> context is its verbosity for large coverages.
>
>  For sure. You wouldn’t want to deliver large coverages serialized as RDF.
>
>  **But** - QB provides a data model that allows you to express
> sub-setting operations in SPARQL. That looks like a win to me. I.e. think
> of QB as an API, not a payload.
>
>  *From:* Bill Roberts [mailto:bill@swirrl.com <bill@swirrl.com>]
> *Sent:* Wednesday, 20 July 2016 6:42 AM
> *To:* public-sdw-wg@w3.org
> *Cc:* Maik Riechert <m.riechert@reading.ac.uk>; Jon Blower <
> j.d.blower@reading.ac.uk>
> *Subject:* Coverage subgroup update
>
>  Hi all
>
>  Sorry for being a bit quiet on this over the last month or so - it was as
> a result of a combination of holiday and other commitments.
>
>  However, some work on the topic has been continuing.  Here is an update
> for discussion in the SDW plenary call tomorrow.
>
>  In particular I had a meeting in Reading on 5 July with Jon Blower and
> fellow-editor Maik Riechert.
>
>  During that we came up with a proposed approach that I would like to put
> to the group.  The essence of this is that we take the CoverageJSON
> specification of Maik and Jon and put it forward as a potential W3C/OGC
> recommendation.  See
> https://github.com/covjson/specification/blob/master/spec.md for the
> current status of the CoverageJSON specification.
>
>  That spec is still work in progress and we identified a couple of areas
> where we know we'll want to add to it, notably around a URI convention for
> identifying an extract of a gridded coverage, including the ability to
> identify a single point within a coverage. (Some initial discussion of this
> issue at https://github.com/covjson/specification/issues/66).
>
>  Maik and Jon understandably feel that it is for others to judge whether
> their work is an appropriate solution to the requirements of the SDW
> group.  My opinion from our discussions and initial review of our
> requirements is that it is indeed a good solution and I hope I can be
> reasonably objective about that.
>
>  My intention is to work through the requirements from the UCR again and
> systematically test and cross-reference them to parts of the CovJSON spec.
> I've set up a wiki page for that:
> https://www.w3.org/2015/spatial/wiki/Cross_reference_of_UCR_to_CovJSON_spec
>  That should give us a focus for identifying and discussing issues around
> the details of the spec and provide evidence of the suitability of the
> approach (or not, as the case may be).
>
>  There has also been substantial interest and work within the coverage
> sub-group on how to apply the RDF Data Cube vocabulary to coverage data,
> and some experiments on possible adaptations to it.  The main potential
> drawback of the RDF Data Cube approach in this context is its verbosity for
> large coverages.  My feeling is that the standard RDF Data Cube approach
> could be a good option in the subset of applications where the total data
> volume is not excessive - creating a qb:Observation and associated triples
> for each data point in a coverage.  I'd like to see us prepare a note of
> some sort to explain how that would work.  I also think it would be
> possible and desirable to document a transformation algorithm or process
> for converting CoverageJSON (with its 'abbreviated' approach to defining
> the domain of a coverage) to an RDF Data Cube representation.
>
>  So the proposed outputs of the group would then be:
>
>  1) the specification of the CoverageJSON format, to become a W3
> Recommendation (and OGC equivalent)
>
> 2) a Primer document to help people understand how to get started with it.
>  (Noting that Maik has already prepared some learning material at
> https://covjson.gitbooks.io/cookbook/content/)
>
> 3) contributions to the SDW BP relating to coverage data, to explain how
> CovJSON would be applied in relevant applications
>
> 4) a note on how RDF Data Cube can be used for coverages and a process for
> converting CovJSON to RDF Data Cube
>
>  Naturally I expect to discuss this proposal in plenary and coverage
> sub-group calls!
>
>  Best regards
>
>  Bill
>
>
Received on Thursday, 21 July 2016 12:46:21 UTC