- From: <Simon.Cox@csiro.au>
- Date: Wed, 23 Mar 2016 22:17:45 +0000
- To: <phila@w3.org>, <public-sdw-wg@w3.org>
- Message-ID: <9055ec0add6849b0b7dbfb5195774bc7@exch1-mel.nexus.csiro.au>
I notice that one of the issues discussed relates to identifiers for coverage subsets (and the relationship with queries ...). There is an RDA recommendation on 'Data Citation' which covers some of this area. It is primarily expressed as a series of requirements, rather than a solution, so may be a useful checklist here. See attached paper. Unfortunately the RDA website is down at the moment so I can't check the direct links to their outputs, but I think this will work when it is back up again: https://www.rd-alliance.org/group/data-citation-wg.html https://rd-alliance.org/system/files/documents/RDA-DC-Recommendations_150924.pdf -----Original Message----- From: Phil Archer [mailto:phila@w3.org] Sent: Thursday, 24 March 2016 8:46 AM To: SDW WG Public List <public-sdw-wg@w3.org> Subject: [Minutes-Cov] 2016-03-23 The minutes of today's Coverages sub group meeting are at https://www.w3.org/2016/03/23-sdwcov-minutes and copied as text below. We were joined on this occasion by Bernadette Loscio and Newton Calegari, 2 of the editors of the DWBP doc, to talk about subsetting. ... <billroberts> [18]https://www.w3.org/2015/spatial/wiki/Coverage_UCR_notes [18] https://www.w3.org/2015/spatial/wiki/Coverage_UCR_notes billroberts: at bottom in summary see that subsetting came out a lot ... assign an identifier to a subset of a coverage of a dataset ... also for provenance so you can point to how the processing happened ... the question of delivering a full coverage is a special case of delivering a subset -- if we address addressing and formatting it will be solved ... also some use cases for poihnt cloud and time series -- need to keep these in mind ... also note that the region of interest might be complicated, not just a bounding box, may be polygon or tunnel underground ... any comments? jtandy: they are the things I can recall phila: note the way subsetting tumbles out becuase we are struggling in dwbp to say something that is *not* spatailly-specific ... dwbp does not have good use cases for this billroberts: also we have time subsets and variable subsets +q <Zakim> jtandy, you wanted to query predefined subsets or on-the-fly query <eparsons> jtandy jtandy: we had a long email thread on subsetting for BP ... one kind is subsetting for useful chunks to be manageable (a predefined set) ... other kind is an on-the-fly query chunk ... we need both ... rdf datacube does predefined type but not query type billroberts: datacube can be used for query-type but perhaps less flexible jtandy: when i assign an identifier to a subset it could be anythinh ... but a query type identifier is also an api, effectively <phila> kerry: I hate us calling it subsetting given all the different dimensions that we need to talk about kerry: does not like "subsetting" <phila> Discussion between phila and kerry about whether audience for Coverages doc is only spatial folks <phila> kerry: How about 'sub coverage?' <phila> billroberts: That makes sense to me <phila> phila: Doesn't like 'sub coverage' <scribe> ACTION: kerry to present some suggestions for renaming "subsetting" [recorded in [19]http://www.w3.org/2016/03/23-sdwcov-minutes.html#action01] [19] http://www.w3.org/2016/03/23-sdwcov-minutes.html#action01] <trackbot> Created ACTION-152 - Present some suggestions for renaming "subsetting" [on Kerry Taylor - due 2016-03-30]. <BernadetteLoscio> yes! <billroberts> [20]http://w3c.github.io/dwbp/bp.html#EnableDataSubsetting [20] http://w3c.github.io/dwbp/bp.html#EnableDataSubsetting DWBP subsetting BernadetteLoscio: we have a proposal as in the irc, but it is difficult to test ... it is generic and important but there are different approaches ... e.g. apis, queries ... we are not sure whether we should have this as a bp or to just describe it ... what would be helpful to you and how would it be testable? <Zakim> jtandy, you wanted to ask if you could cover 'subsetting' as an example operation in your API jtandy: when I look at subsetting I think it is one example of the way you could work with data... there are other BP about offering an API in DWBP ... data subsetting makes a lot of sesne for slices for statistical, etc, but when I look more generically it is really just an operation you provie thru an API ... could be just an illustrative example ... but it makes a lot of sense for time series and satellite data (somehow differently) bernadette: : should we also talk about subsetting for download jtandy: you should also talk about the data you take away after downloading ... I would suggest when working with large datasets a typical use case would be an api to select parts of that dataset ... difficult for you to reference what we do, but I suggest just describe an illustrative example of a convenience API BernadetteLoscio: perhaps we can talk about subsetting along with downloads as another example billroberts: the problem with api/query is that it is futile to specify upfront what it should look like in general ... maybe all we can do is say "you need an API" or esle we end up inventing yet another query language ... needs to be up to the data provider jtandy: agrees phila: <moved us with his absent speech> <phila> phila: Requirement no. 1 can assign an identifier to a subset of a coverage dataset phila: we have been saying "you just give it a uri", although a uri *is* an api ... for bulk download is it useful to say you can use the api and you can give it an example of its own, e.g. meteorological data for the last week ... should this go in dwbp or sdw? ... should dwbp do this ... your first ucr says you need to asign an identifier to a subset billroberts: yes it would be useful jtandy: it makes sense to for dwbp to provide some advice -- if you have data that is too big for a web application then providew a mechanism to get hold of bits of it ... eg. using predefined slices or an API ... test by "here is a massive dataset -- can you work with it in a browser app? billroberts: use cases where this emerged was wanting to attach some metadata to it, something that is the full set, not a subset <phila> is that helpful newton_dwbp? <newton_dwbp> I liked jtandy point billroberts: need to look again at email thread on this, any otehr comments? BernadetteLoscio: we like jtandy's idea and will bring to our dwbp discussion. thank you very much RDF datacube action billroberts: which aspects of rdf datacube would be good for defining subsets? <Zakim> jtandy, you wanted to note qb:slice billroberts: bill will write note on pros and con of datacube and mechanisms that would be helpful for subsets dmitrybrizhinev: ... we are a group of students working on an example implementation for coverages, we are worried about verbosity of datacube ... flipside is taking a subset with lots of granularity with a sparql query is useful butused verbose ... i have been converting the coveragesjson to rdf but this is the query... is there a best of both worlds billroberts: please share anything written up jtandy: agree about way too verbose, jonblower keeps saying this cannot be used to carry the data, but the metadata might be useful <jtandy> [21]https://www.w3.org/TR/vocab-data-cube/#slices [21] https://www.w3.org/TR/vocab-data-cube/#slices jtandy: for describing subsets there is qp:slice and also a mechanism for creating arbitrary groups in the spec ... leaving the data in a desne array is arguable no different to the way we deal with goespatail stuff all the time, eg geometry objeects in WKT or in GML ... becuase we want to treat the whole geometry as an object (we don't break it up), the same can apply to a dense array of data ... in the same way the geosparql can provide operations on data, when we are working with coverage data in a webby form we ned to provide some additional mechanism for querying inside billroberts: e.g.75th point of array needs to be accessible, and you need some coordinates that stick with the points... that kind of conciseness is needed for whole grid but when there are only bits it may work well ... datacube couldwork well itself for a small subset if not the entire grid jtandy: if you just want ith column and jth row ... <phila> kerry: were you suggesting, Bill, that the QB model could be used as a response format for a query over a bigger set <phila> billroberts: Not precisely, but that structure of an observation <phila> ... If you just have one data point, you need all the dimensional info and the metadata. Some metadata applies to the whole dataset, some to a specific point. <phila> ... If you have a grid, you don't need all the coords 'cos you can work them out but a point cloud does need them. billroberts: the structure of an observation is very useful for datacube way jtandy: index space querying , natural coord subsetting, more work to do here... phila: what proportion of coverage data is on a regular grid? ... I am thinking of those with only 2 or 3 lines with regular definition and you can work the rest out ... in such cases a template uri could be generated that does identifiy a "slice" ... so we could say "of you have a regular grid pattern this is how you generate the uri template" <Zakim> jtandy, you wanted to respond to phila's question about regular grids jtandy: yes it is a large fraction by volume and number of datasets, eg satellite imagery, ... but there are other important cases such as in-situ observations by radiospondes or buoys or gliders irregular coverages happen more (like opendap/netcdf index-based subsetting) eparsons: aggrees. my meta-question is , where is the stuff with a more semantic approach -- are we just reinventing the wheel of tools in other places? jtandy: phil had said data is easy, metadata is challenge. the metadata is the bit to get the advantage of linked data such as what you are measuring etc ... metadata as linked data is key, then something else for dense arrays of data eparsons: so lets not get worked up on data size then as ther are other approaches billroberts: index array of data is good approah for some stuff, but we need to think harder about others Jeremy Stole our time slot billroberts: jeremy stole out timeslot ... we had been proposing to follow the main group for time changes ...
Attachments
- application/pdf attachment: 20151012-TCDL-RDA-Guidelines_submission.pdf
Received on Wednesday, 23 March 2016 22:19:05 UTC