- From: <Simon.Cox@csiro.au>
- Date: Thu, 24 Mar 2016 02:02:06 +0000
- To: <Simon.Cox@csiro.au>, <phila@w3.org>, <public-sdw-wg@w3.org>
RDA is back online now.
See https://rd-alliance.org/group/data-citation-wg/outcomes/data-citation-recommendation.html
-----Original Message-----
From: Simon.Cox@csiro.au [mailto:Simon.Cox@csiro.au]
Sent: Thursday, 24 March 2016 9:18 AM
To: phila@w3.org; public-sdw-wg@w3.org
Subject: [ExternalEmail] Identifying coverage subsets
I notice that one of the issues discussed relates to identifiers for coverage subsets (and the relationship with queries ...).
There is an RDA recommendation on 'Data Citation' which covers some of this area.
It is primarily expressed as a series of requirements, rather than a solution, so may be a useful checklist here.
See attached paper.
Unfortunately the RDA website is down at the moment so I can't check the direct links to their outputs, but I think this will work when it is back up again:
https://www.rd-alliance.org/group/data-citation-wg.html
https://rd-alliance.org/system/files/documents/RDA-DC-Recommendations_150924.pdf
-----Original Message-----
From: Phil Archer [mailto:phila@w3.org]
Sent: Thursday, 24 March 2016 8:46 AM
To: SDW WG Public List <public-sdw-wg@w3.org>
Subject: [Minutes-Cov] 2016-03-23
The minutes of today's Coverages sub group meeting are at https://www.w3.org/2016/03/23-sdwcov-minutes and copied as text below.
We were joined on this occasion by Bernadette Loscio and Newton Calegari, 2 of the editors of the DWBP doc, to talk about subsetting.
...
<billroberts>
[18]https://www.w3.org/2015/spatial/wiki/Coverage_UCR_notes
[18] https://www.w3.org/2015/spatial/wiki/Coverage_UCR_notes
billroberts: at bottom in summary see that subsetting came out
a lot
... assign an identifier to a subset of a coverage of a dataset
... also for provenance so you can point to how the processing
happened
... the question of delivering a full coverage is a special
case of delivering a subset -- if we address addressing and
formatting it will be solved
... also some use cases for poihnt cloud and time series --
need to keep these in mind
... also note that the region of interest might be complicated,
not just a bounding box, may be polygon or tunnel underground
... any comments?
jtandy: they are the things I can recall
phila: note the way subsetting tumbles out becuase we are
struggling in dwbp to say something that is *not*
spatailly-specific
... dwbp does not have good use cases for this
billroberts: also we have time subsets and variable subsets
+q
<Zakim> jtandy, you wanted to query predefined subsets or
on-the-fly query
<eparsons> jtandy
jtandy: we had a long email thread on subsetting for BP
... one kind is subsetting for useful chunks to be manageable
(a predefined set)
... other kind is an on-the-fly query chunk
... we need both
... rdf datacube does predefined type but not query type
billroberts: datacube can be used for query-type but perhaps
less flexible
jtandy: when i assign an identifier to a subset it could be
anythinh
... but a query type identifier is also an api, effectively
<phila> kerry: I hate us calling it subsetting given all the
different dimensions that we need to talk about
kerry: does not like "subsetting"
<phila> Discussion between phila and kerry about whether
audience for Coverages doc is only spatial folks
<phila> kerry: How about 'sub coverage?'
<phila> billroberts: That makes sense to me
<phila> phila: Doesn't like 'sub coverage'
<scribe> ACTION: kerry to present some suggestions for renaming
"subsetting" [recorded in
[19]http://www.w3.org/2016/03/23-sdwcov-minutes.html#action01]
[19] http://www.w3.org/2016/03/23-sdwcov-minutes.html#action01]
<trackbot> Created ACTION-152 - Present some suggestions for
renaming "subsetting" [on Kerry Taylor - due 2016-03-30].
<BernadetteLoscio> yes!
<billroberts>
[20]http://w3c.github.io/dwbp/bp.html#EnableDataSubsetting
[20] http://w3c.github.io/dwbp/bp.html#EnableDataSubsetting
DWBP subsetting
BernadetteLoscio: we have a proposal as in the irc, but it is
difficult to test
... it is generic and important but there are different
approaches
... e.g. apis, queries
... we are not sure whether we should have this as a bp or to
just describe it
... what would be helpful to you and how would it be testable?
<Zakim> jtandy, you wanted to ask if you could cover
'subsetting' as an example operation in your API
jtandy: when I look at subsetting I think it is one example of
the way you could work with data... there are other BP about
offering an API in DWBP
... data subsetting makes a lot of sesne for slices for
statistical, etc, but when I look more generically it is really
just an operation you provie thru an API
... could be just an illustrative example
... but it makes a lot of sense for time series and satellite
data (somehow differently)
bernadette: : should we also talk about subsetting for download
jtandy: you should also talk about the data you take away after
downloading
... I would suggest when working with large datasets a typical
use case would be an api to select parts of that dataset
... difficult for you to reference what we do, but I suggest
just describe an illustrative example of a convenience API
BernadetteLoscio: perhaps we can talk about subsetting along
with downloads as another example
billroberts: the problem with api/query is that it is futile to
specify upfront what it should look like in general
... maybe all we can do is say "you need an API" or esle we end
up inventing yet another query language
... needs to be up to the data provider
jtandy: agrees
phila: <moved us with his absent speech>
<phila> phila: Requirement no. 1 can assign an identifier to a
subset of a coverage dataset
phila: we have been saying "you just give it a uri", although a
uri *is* an api
... for bulk download is it useful to say you can use the api
and you can give it an example of its own, e.g. meteorological
data for the last week
... should this go in dwbp or sdw?
... should dwbp do this ... your first ucr says you need to
asign an identifier to a subset
billroberts: yes it would be useful
jtandy: it makes sense to for dwbp to provide some advice -- if
you have data that is too big for a web application then
providew a mechanism to get hold of bits of it
... eg. using predefined slices or an API
... test by "here is a massive dataset -- can you work with it
in a browser app?
billroberts: use cases where this emerged was wanting to attach
some metadata to it, something that is the full set, not a
subset
<phila> is that helpful newton_dwbp?
<newton_dwbp> I liked jtandy point
billroberts: need to look again at email thread on this, any
otehr comments?
BernadetteLoscio: we like jtandy's idea and will bring to our
dwbp discussion. thank you very much
RDF datacube action
billroberts: which aspects of rdf datacube would be good for
defining subsets?
<Zakim> jtandy, you wanted to note qb:slice
billroberts: bill will write note on pros and con of datacube
and mechanisms that would be helpful for subsets
dmitrybrizhinev: ... we are a group of students working on an
example implementation for coverages, we are worried about
verbosity of datacube
... flipside is taking a subset with lots of granularity with a
sparql query is useful butused verbose
... i have been converting the coveragesjson to rdf but this is
the query... is there a best of both worlds
billroberts: please share anything written up
jtandy: agree about way too verbose, jonblower keeps saying
this cannot be used to carry the data, but the metadata might
be useful
<jtandy> [21]https://www.w3.org/TR/vocab-data-cube/#slices
[21] https://www.w3.org/TR/vocab-data-cube/#slices
jtandy: for describing subsets there is qp:slice and also a
mechanism for creating arbitrary groups in the spec
... leaving the data in a desne array is arguable no different
to the way we deal with goespatail stuff all the time, eg
geometry objeects in WKT or in GML
... becuase we want to treat the whole geometry as an object
(we don't break it up), the same can apply to a dense array of
data
... in the same way the geosparql can provide operations on
data, when we are working with coverage data in a webby form we
ned to provide some additional mechanism for querying inside
billroberts: e.g.75th point of array needs to be accessible,
and you need some coordinates that stick with the points...
that kind of conciseness is needed for whole grid but when
there are only bits it may work well
... datacube couldwork well itself for a small subset if not
the entire grid
jtandy: if you just want ith column and jth row ...
<phila> kerry: were you suggesting, Bill, that the QB model
could be used as a response format for a query over a bigger
set
<phila> billroberts: Not precisely, but that structure of an
observation
<phila> ... If you just have one data point, you need all the
dimensional info and the metadata. Some metadata applies to the
whole dataset, some to a specific point.
<phila> ... If you have a grid, you don't need all the coords
'cos you can work them out but a point cloud does need them.
billroberts: the structure of an observation is very useful for
datacube way
jtandy: index space querying , natural coord subsetting, more
work to do here...
phila: what proportion of coverage data is on a regular grid?
... I am thinking of those with only 2 or 3 lines with regular
definition and you can work the rest out
... in such cases a template uri could be generated that does
identifiy a "slice"
... so we could say "of you have a regular grid pattern this is
how you generate the uri template"
<Zakim> jtandy, you wanted to respond to phila's question about
regular grids
jtandy: yes it is a large fraction by volume and number of
datasets, eg satellite imagery,
... but there are other important cases such as in-situ
observations by radiospondes or buoys or gliders irregular
coverages happen more (like opendap/netcdf index-based
subsetting)
eparsons: aggrees. my meta-question is , where is the stuff
with a more semantic approach -- are we just reinventing the
wheel of tools in other places?
jtandy: phil had said data is easy, metadata is challenge. the
metadata is the bit to get the advantage of linked data such as
what you are measuring etc
... metadata as linked data is key, then something else for
dense arrays of data
eparsons: so lets not get worked up on data size then as ther
are other approaches
billroberts: index array of data is good approah for some
stuff, but we need to think harder about others
Jeremy Stole our time slot
billroberts: jeremy stole out timeslot
... we had been proposing to follow the main group for time
changes
...
Received on Thursday, 24 March 2016 02:02:59 UTC