EO-QB comments from Bill Roberts on 2017-03-13 (public-sdw-wg@w3.org from March 2017)

From: Bill Roberts <bill@swirrl.com>
Date: Mon, 13 Mar 2017 18:39:06 +0000
To: "public-sdw-wg@w3.org" <public-sdw-wg@w3.org>, Dmitry Brizhinev <dmitry.brizhinev@anu.edu.au>, Sam Toyer <u5568237@anu.edu.au>, Kerry Taylor <kerry.taylor@anu.edu.au>
Message-ID: <CAMTVsunjQjdmbJ5FY5s2PdGjDUMXCJ9SBtfAaqRcDEkEAHuqnA@mail.gmail.com>

Hi all

I've had a detailed look through the editor's draft of EO-QB. Overall I
think it's looking good but I made a few comments as I went through.

I've made a few suggested changes to the wording here and there in this
pull request: https://github.com/w3c/sdw/pull/609

Nothing really significant but I hope it might make things clearer and a
little more precise in some places.

There are some other comments or questions below that it might be
interesting to address and perhaps discuss via the mailing list or in the
next call.

Hope that's useful

Best regards

Bill


Example 4 - declares the range of the measure property to be xsd:anyURI but
the example actually has a string as the value of that property.  Maybe use
<http://www.example.org/led-example-image-R000> instead?

Section 3.2 What do you mean by: "With sufficiently advanced middleware,
SPARQL queries over the dataset
 could be served just as if the data were stored in RDF, but for a fraction
of the storage cost".

I can't see a query against pixel values working in any reasonable amount
of time if the middleware has to 'unpack' each image to look inside it in
order to answer the query.  There is a balance of speed vs data size here,
and if you optimise for data size, then you would lose a lot of speed.

So "The publisher can thus leverage the full power of Linked Data." seems a
rash and unjustified claim here.  Probably the less exciting sounding "The
publisher can thus leverage some of the power of Linked Data" :-)

"The RDF Data Cube provides only for “slices”".  It's true that the RDF
Data Cube defines a mechanism for 'materialising' a slice and linking all
the observations to it.  So if you want all the values in a slice, then
there is an easy to evaluate SPARQL query that can get those.  In practice
a SPARQL query can just as easily get all observations where the value of a
dimension is equal to a chosen value (i.e. a 'slice') so most people don't
bother pre-defining slices.  It just makes more triples for not much extra
value - at least if you are serving data using SPARQL rather than
pre-canning a lot of RDF files.

If you wanted to query for all observations with a location inside a
bounding box, then your query would have to do some inequality evaluation,
which is a fair bit slower than an index look-up, so simple to write but
slower to evaluate.  You could perhaps do something like the 'tile'
equivalent of a slice, by making a triple that linked an observation to a
rectangular area. So you might not be able to answer "all pixels within
10km of Canberra" but you could make it quick to find "all pixels in the
10km x 10km area that contains Canberra".  This kind of thing sounds like a
good match to the DGGS approach.

4.1 - the description here of how a typical triple store works may be doing
a disservice to the implementers of those databases!  In general, a lot of
those bindings will be evaluated by index look-ups.  Most triple stores
will have some kind of 'explain query plan' method that shows what the
database is going to do, if you want to investigate the details.  I'm
certainly not an expert on this.   This is quite an interesting article on
how Stardog does it
https://blog.stardog.com/how-to-read-stardog-query-plans/.  Other RDF
databases are probably broadly similar.

Add a reference for the 'virtual graphs' approach?

5.3 "The working group intends to standardize better properties which allow
the use of other CRSs" - does that refer to work on updating GeoSPARQL?
 not sure what we'll actually be able to achieve in this area.

Received on Monday, 13 March 2017 18:39:40 UTC