Re: Identifying coverage subsets from Bill Roberts on 2016-03-24 (public-sdw-wg@w3.org from March 2016)

From: Bill Roberts <bill@swirrl.com>
Date: Thu, 24 Mar 2016 08:21:02 +0000
To: Simon Cox <Simon.Cox@csiro.au>
Cc: Phil Archer <phila@w3.org>, "public-sdw-wg@w3.org" <public-sdw-wg@w3.org>
Message-ID: <CAMTVsunCs34C6CLyroOdxno_+FZoJty_sWX10AjM-aGRxJtEJQ@mail.gmail.com>
Thanks Simon - that looks very useful

On 24 March 2016 at 02:02, <Simon.Cox@csiro.au> wrote:

> RDA is back online now.
> See
> https://rd-alliance.org/group/data-citation-wg/outcomes/data-citation-recommendation.html
>
> -----Original Message-----
> From: Simon.Cox@csiro.au [mailto:Simon.Cox@csiro.au]
> Sent: Thursday, 24 March 2016 9:18 AM
> To: phila@w3.org; public-sdw-wg@w3.org
> Subject: [ExternalEmail] Identifying coverage subsets
>
> I notice that one of the issues discussed relates to identifiers for
> coverage subsets (and the relationship with queries ...).
> There is an RDA recommendation on 'Data Citation' which covers some of
> this area.
> It is primarily expressed as a series of requirements, rather than a
> solution, so may be a useful checklist here.
> See attached paper.
>
> Unfortunately the RDA website is down at the moment so I can't check the
> direct links to their outputs, but I think this will work when it is back
> up again:
>
> https://www.rd-alliance.org/group/data-citation-wg.html
>
> https://rd-alliance.org/system/files/documents/RDA-DC-Recommendations_150924.pdf
>
>
> -----Original Message-----
> From: Phil Archer [mailto:phila@w3.org]
> Sent: Thursday, 24 March 2016 8:46 AM
> To: SDW WG Public List <public-sdw-wg@w3.org>
> Subject: [Minutes-Cov] 2016-03-23
>
> The minutes of today's Coverages sub group meeting are at
> https://www.w3.org/2016/03/23-sdwcov-minutes and copied as text below.
>
> We were joined on this occasion by Bernadette Loscio and Newton Calegari,
> 2 of the editors of the DWBP doc, to talk about subsetting.
>
>
> ...
>
>     <billroberts>
>     [18]https://www.w3.org/2015/spatial/wiki/Coverage_UCR_notes
>
>       [18] https://www.w3.org/2015/spatial/wiki/Coverage_UCR_notes
>
>     billroberts: at bottom in summary see that subsetting came out
>     a lot
>     ... assign an identifier to a subset of a coverage of a dataset
>     ... also for provenance so you can point to how the processing
>     happened
>     ... the question of delivering a full coverage is a special
>     case of delivering a subset -- if we address addressing and
>     formatting it will be solved
>     ... also some use cases for poihnt cloud and time series --
>     need to keep these in mind
>     ... also note that the region of interest might be complicated,
>     not just a bounding box, may be polygon or tunnel underground
>     ... any comments?
>
>     jtandy: they are the things I can recall
>
>     phila: note the way subsetting tumbles out becuase we are
>     struggling in dwbp to say something that is *not*
>     spatailly-specific
>     ... dwbp does not have good use cases for this
>
>     billroberts: also we have time subsets and variable subsets
>
>     +q
>
>     <Zakim> jtandy, you wanted to query predefined subsets or
>     on-the-fly query
>
>     <eparsons> jtandy
>
>     jtandy: we had a long email thread on subsetting for BP
>     ... one kind is subsetting for useful chunks to be manageable
>     (a predefined set)
>     ... other kind is an on-the-fly query chunk
>     ... we need both
>     ... rdf datacube does predefined type but not query type
>
>     billroberts: datacube can be used for query-type but perhaps
>     less flexible
>
>     jtandy: when i assign an identifier to a subset it could be
>     anythinh
>     ... but a query type identifier is also an api, effectively
>
>     <phila> kerry: I hate us calling it subsetting given all the
>     different dimensions that we need to talk about
>
>     kerry: does not like "subsetting"
>
>     <phila> Discussion between phila and kerry about whether
>     audience for Coverages doc is only spatial folks
>
>     <phila> kerry: How about 'sub coverage?'
>
>     <phila> billroberts: That makes sense to me
>
>     <phila> phila: Doesn't like 'sub coverage'
>
>     <scribe> ACTION: kerry to present some suggestions for renaming
>     "subsetting" [recorded in
>     [19]http://www.w3.org/2016/03/23-sdwcov-minutes.html#action01]
>
>       [19] http://www.w3.org/2016/03/23-sdwcov-minutes.html#action01]
>
>     <trackbot> Created ACTION-152 - Present some suggestions for
>     renaming "subsetting" [on Kerry Taylor - due 2016-03-30].
>
>     <BernadetteLoscio> yes!
>
>     <billroberts>
>     [20]http://w3c.github.io/dwbp/bp.html#EnableDataSubsetting
>
>       [20] http://w3c.github.io/dwbp/bp.html#EnableDataSubsetting
>
> DWBP subsetting
>
>     BernadetteLoscio: we have a proposal as in the irc, but it is
>     difficult to test
>     ... it is generic and important but there are different
>     approaches
>     ... e.g. apis, queries
>     ... we are not sure whether we should have this as a bp or to
>     just describe it
>     ... what would be helpful to you and how would it be testable?
>
>     <Zakim> jtandy, you wanted to ask if you could cover
>     'subsetting' as an example operation in your API
>
>     jtandy: when I look at subsetting I think it is one example of
>     the way you could work with data... there are other BP about
>     offering an API in DWBP
>     ... data subsetting makes a lot of sesne for slices for
>     statistical, etc, but when I look more generically it is really
>     just an operation you provie thru an API
>     ... could be just an illustrative example
>     ... but it makes a lot of sense for time series and satellite
>     data (somehow differently)
>
>     bernadette: : should we also talk about subsetting for download
>
>     jtandy: you should also talk about the data you take away after
>     downloading
>     ... I would suggest when working with large datasets a typical
>     use case would be an api to select parts of that dataset
>     ... difficult for you to reference what we do, but I suggest
>     just describe an illustrative example of a convenience API
>
>     BernadetteLoscio: perhaps we can talk about subsetting along
>     with downloads as another example
>
>     billroberts: the problem with api/query is that it is futile to
>     specify upfront what it should look like in general
>     ... maybe all we can do is say "you need an API" or esle we end
>     up inventing yet another query language
>     ... needs to be up to the data provider
>
>     jtandy: agrees
>
>     phila: <moved us with his absent speech>
>
>     <phila> phila: Requirement no. 1 can assign an identifier to a
>     subset of a coverage dataset
>
>     phila: we have been saying "you just give it a uri", although a
>     uri *is* an api
>     ... for bulk download is it useful to say you can use the api
>     and you can give it an example of its own, e.g. meteorological
>     data for the last week
>     ... should this go in dwbp or sdw?
>     ... should dwbp do this ... your first ucr says you need to
>     asign an identifier to a subset
>
>     billroberts: yes it would be useful
>
>     jtandy: it makes sense to for dwbp to provide some advice -- if
>     you have data that is too big for a web application then
>     providew a mechanism to get hold of bits of it
>     ... eg. using predefined slices or an API
>     ... test by "here is a massive dataset -- can you work with it
>     in a browser app?
>
>     billroberts: use cases where this emerged was wanting to attach
>     some metadata to it, something that is the full set, not a
>     subset
>
>     <phila> is that helpful newton_dwbp?
>
>     <newton_dwbp> I liked jtandy point
>
>     billroberts: need to look again at email thread on this, any
>     otehr comments?
>
>     BernadetteLoscio: we like jtandy's idea and will bring to our
>     dwbp discussion. thank you very much
>
> RDF datacube action
>
>     billroberts: which aspects of rdf datacube would be good for
>     defining subsets?
>
>     <Zakim> jtandy, you wanted to note qb:slice
>
>     billroberts: bill will write note on pros and con of datacube
>     and mechanisms that would be helpful for subsets
>
>     dmitrybrizhinev: ... we are a group of students working on an
>     example implementation for coverages, we are worried about
>     verbosity of datacube
>     ... flipside is taking a subset with lots of granularity with a
>     sparql query is useful butused verbose
>     ... i have been converting the coveragesjson to rdf but this is
>     the query... is there a best of both worlds
>
>     billroberts: please share anything written up
>
>     jtandy: agree about way too verbose, jonblower keeps saying
>     this cannot be used to carry the data, but the metadata might
>     be useful
>
>     <jtandy> [21]https://www.w3.org/TR/vocab-data-cube/#slices
>
>       [21] https://www.w3.org/TR/vocab-data-cube/#slices
>
>     jtandy: for describing subsets there is qp:slice and also a
>     mechanism for creating arbitrary groups in the spec
>     ... leaving the data in a desne array is arguable no different
>     to the way we deal with goespatail stuff all the time, eg
>     geometry objeects in WKT or in GML
>     ... becuase we want to treat the whole geometry as an object
>     (we don't break it up), the same can apply to a dense array of
>     data
>     ... in the same way the geosparql can provide operations on
>     data, when we are working with coverage data in a webby form we
>     ned to provide some additional mechanism for querying inside
>
>     billroberts: e.g.75th point of array needs to be accessible,
>     and you need some coordinates that stick with the points...
>     that kind of conciseness is needed for whole grid but when
>     there are only bits it may work well
>     ... datacube couldwork well itself for a small subset if not
>     the entire grid
>
>     jtandy: if you just want ith column and jth row ...
>
>     <phila> kerry: were you suggesting, Bill, that the QB model
>     could be used as a response format for a query over a bigger
>     set
>
>     <phila> billroberts: Not precisely, but that structure of an
>     observation
>
>     <phila> ... If you just have one data point, you need all the
>     dimensional info and the metadata. Some metadata applies to the
>     whole dataset, some to a specific point.
>
>     <phila> ... If you have a grid, you don't need all the coords
>     'cos you can work them out but a point cloud does need them.
>
>     billroberts: the structure of an observation is very useful for
>     datacube way
>
>     jtandy: index space querying , natural coord subsetting, more
>     work to do here...
>
>     phila: what proportion of coverage data is on a regular grid?
>     ... I am thinking of those with only 2 or 3 lines with regular
>     definition and you can work the rest out
>     ... in such cases a template uri could be generated that does
>     identifiy a "slice"
>     ... so we could say "of you have a regular grid pattern this is
>     how you generate the uri template"
>
>     <Zakim> jtandy, you wanted to respond to phila's question about
>     regular grids
>
>     jtandy: yes it is a large fraction by volume and number of
>     datasets, eg satellite imagery,
>     ... but there are other important cases such as in-situ
>     observations by radiospondes or buoys or gliders irregular
>     coverages happen more (like opendap/netcdf index-based
>     subsetting)
>
>     eparsons: aggrees. my meta-question is , where is the stuff
>     with a more semantic approach -- are we just reinventing the
>     wheel of tools in other places?
>
>     jtandy: phil had said data is easy, metadata is challenge. the
>     metadata is the bit to get the advantage of linked data such as
>     what you are measuring etc
>     ... metadata as linked data is key, then something else for
>     dense arrays of data
>
>     eparsons: so lets not get worked up on data size then as ther
>     are other approaches
>
>     billroberts: index array of data is good approah for some
>     stuff, but we need to think harder about others
>
> Jeremy Stole our time slot
>
>     billroberts: jeremy stole out timeslot
>     ... we had been proposing to follow the main group for time
>     changes
>
> ...
>
>
Received on Thursday, 24 March 2016 08:21:33 UTC