- From: Bill Roberts <bill@swirrl.com>
- Date: Thu, 24 Mar 2016 08:21:02 +0000
- To: Simon Cox <Simon.Cox@csiro.au>
- Cc: Phil Archer <phila@w3.org>, "public-sdw-wg@w3.org" <public-sdw-wg@w3.org>
- Message-ID: <CAMTVsunCs34C6CLyroOdxno_+FZoJty_sWX10AjM-aGRxJtEJQ@mail.gmail.com>
Thanks Simon - that looks very useful On 24 March 2016 at 02:02, <Simon.Cox@csiro.au> wrote: > RDA is back online now. > See > https://rd-alliance.org/group/data-citation-wg/outcomes/data-citation-recommendation.html > > -----Original Message----- > From: Simon.Cox@csiro.au [mailto:Simon.Cox@csiro.au] > Sent: Thursday, 24 March 2016 9:18 AM > To: phila@w3.org; public-sdw-wg@w3.org > Subject: [ExternalEmail] Identifying coverage subsets > > I notice that one of the issues discussed relates to identifiers for > coverage subsets (and the relationship with queries ...). > There is an RDA recommendation on 'Data Citation' which covers some of > this area. > It is primarily expressed as a series of requirements, rather than a > solution, so may be a useful checklist here. > See attached paper. > > Unfortunately the RDA website is down at the moment so I can't check the > direct links to their outputs, but I think this will work when it is back > up again: > > https://www.rd-alliance.org/group/data-citation-wg.html > > https://rd-alliance.org/system/files/documents/RDA-DC-Recommendations_150924.pdf > > > -----Original Message----- > From: Phil Archer [mailto:phila@w3.org] > Sent: Thursday, 24 March 2016 8:46 AM > To: SDW WG Public List <public-sdw-wg@w3.org> > Subject: [Minutes-Cov] 2016-03-23 > > The minutes of today's Coverages sub group meeting are at > https://www.w3.org/2016/03/23-sdwcov-minutes and copied as text below. > > We were joined on this occasion by Bernadette Loscio and Newton Calegari, > 2 of the editors of the DWBP doc, to talk about subsetting. > > > ... > > <billroberts> > [18]https://www.w3.org/2015/spatial/wiki/Coverage_UCR_notes > > [18] https://www.w3.org/2015/spatial/wiki/Coverage_UCR_notes > > billroberts: at bottom in summary see that subsetting came out > a lot > ... assign an identifier to a subset of a coverage of a dataset > ... also for provenance so you can point to how the processing > happened > ... the question of delivering a full coverage is a special > case of delivering a subset -- if we address addressing and > formatting it will be solved > ... also some use cases for poihnt cloud and time series -- > need to keep these in mind > ... also note that the region of interest might be complicated, > not just a bounding box, may be polygon or tunnel underground > ... any comments? > > jtandy: they are the things I can recall > > phila: note the way subsetting tumbles out becuase we are > struggling in dwbp to say something that is *not* > spatailly-specific > ... dwbp does not have good use cases for this > > billroberts: also we have time subsets and variable subsets > > +q > > <Zakim> jtandy, you wanted to query predefined subsets or > on-the-fly query > > <eparsons> jtandy > > jtandy: we had a long email thread on subsetting for BP > ... one kind is subsetting for useful chunks to be manageable > (a predefined set) > ... other kind is an on-the-fly query chunk > ... we need both > ... rdf datacube does predefined type but not query type > > billroberts: datacube can be used for query-type but perhaps > less flexible > > jtandy: when i assign an identifier to a subset it could be > anythinh > ... but a query type identifier is also an api, effectively > > <phila> kerry: I hate us calling it subsetting given all the > different dimensions that we need to talk about > > kerry: does not like "subsetting" > > <phila> Discussion between phila and kerry about whether > audience for Coverages doc is only spatial folks > > <phila> kerry: How about 'sub coverage?' > > <phila> billroberts: That makes sense to me > > <phila> phila: Doesn't like 'sub coverage' > > <scribe> ACTION: kerry to present some suggestions for renaming > "subsetting" [recorded in > [19]http://www.w3.org/2016/03/23-sdwcov-minutes.html#action01] > > [19] http://www.w3.org/2016/03/23-sdwcov-minutes.html#action01] > > <trackbot> Created ACTION-152 - Present some suggestions for > renaming "subsetting" [on Kerry Taylor - due 2016-03-30]. > > <BernadetteLoscio> yes! > > <billroberts> > [20]http://w3c.github.io/dwbp/bp.html#EnableDataSubsetting > > [20] http://w3c.github.io/dwbp/bp.html#EnableDataSubsetting > > DWBP subsetting > > BernadetteLoscio: we have a proposal as in the irc, but it is > difficult to test > ... it is generic and important but there are different > approaches > ... e.g. apis, queries > ... we are not sure whether we should have this as a bp or to > just describe it > ... what would be helpful to you and how would it be testable? > > <Zakim> jtandy, you wanted to ask if you could cover > 'subsetting' as an example operation in your API > > jtandy: when I look at subsetting I think it is one example of > the way you could work with data... there are other BP about > offering an API in DWBP > ... data subsetting makes a lot of sesne for slices for > statistical, etc, but when I look more generically it is really > just an operation you provie thru an API > ... could be just an illustrative example > ... but it makes a lot of sense for time series and satellite > data (somehow differently) > > bernadette: : should we also talk about subsetting for download > > jtandy: you should also talk about the data you take away after > downloading > ... I would suggest when working with large datasets a typical > use case would be an api to select parts of that dataset > ... difficult for you to reference what we do, but I suggest > just describe an illustrative example of a convenience API > > BernadetteLoscio: perhaps we can talk about subsetting along > with downloads as another example > > billroberts: the problem with api/query is that it is futile to > specify upfront what it should look like in general > ... maybe all we can do is say "you need an API" or esle we end > up inventing yet another query language > ... needs to be up to the data provider > > jtandy: agrees > > phila: <moved us with his absent speech> > > <phila> phila: Requirement no. 1 can assign an identifier to a > subset of a coverage dataset > > phila: we have been saying "you just give it a uri", although a > uri *is* an api > ... for bulk download is it useful to say you can use the api > and you can give it an example of its own, e.g. meteorological > data for the last week > ... should this go in dwbp or sdw? > ... should dwbp do this ... your first ucr says you need to > asign an identifier to a subset > > billroberts: yes it would be useful > > jtandy: it makes sense to for dwbp to provide some advice -- if > you have data that is too big for a web application then > providew a mechanism to get hold of bits of it > ... eg. using predefined slices or an API > ... test by "here is a massive dataset -- can you work with it > in a browser app? > > billroberts: use cases where this emerged was wanting to attach > some metadata to it, something that is the full set, not a > subset > > <phila> is that helpful newton_dwbp? > > <newton_dwbp> I liked jtandy point > > billroberts: need to look again at email thread on this, any > otehr comments? > > BernadetteLoscio: we like jtandy's idea and will bring to our > dwbp discussion. thank you very much > > RDF datacube action > > billroberts: which aspects of rdf datacube would be good for > defining subsets? > > <Zakim> jtandy, you wanted to note qb:slice > > billroberts: bill will write note on pros and con of datacube > and mechanisms that would be helpful for subsets > > dmitrybrizhinev: ... we are a group of students working on an > example implementation for coverages, we are worried about > verbosity of datacube > ... flipside is taking a subset with lots of granularity with a > sparql query is useful butused verbose > ... i have been converting the coveragesjson to rdf but this is > the query... is there a best of both worlds > > billroberts: please share anything written up > > jtandy: agree about way too verbose, jonblower keeps saying > this cannot be used to carry the data, but the metadata might > be useful > > <jtandy> [21]https://www.w3.org/TR/vocab-data-cube/#slices > > [21] https://www.w3.org/TR/vocab-data-cube/#slices > > jtandy: for describing subsets there is qp:slice and also a > mechanism for creating arbitrary groups in the spec > ... leaving the data in a desne array is arguable no different > to the way we deal with goespatail stuff all the time, eg > geometry objeects in WKT or in GML > ... becuase we want to treat the whole geometry as an object > (we don't break it up), the same can apply to a dense array of > data > ... in the same way the geosparql can provide operations on > data, when we are working with coverage data in a webby form we > ned to provide some additional mechanism for querying inside > > billroberts: e.g.75th point of array needs to be accessible, > and you need some coordinates that stick with the points... > that kind of conciseness is needed for whole grid but when > there are only bits it may work well > ... datacube couldwork well itself for a small subset if not > the entire grid > > jtandy: if you just want ith column and jth row ... > > <phila> kerry: were you suggesting, Bill, that the QB model > could be used as a response format for a query over a bigger > set > > <phila> billroberts: Not precisely, but that structure of an > observation > > <phila> ... If you just have one data point, you need all the > dimensional info and the metadata. Some metadata applies to the > whole dataset, some to a specific point. > > <phila> ... If you have a grid, you don't need all the coords > 'cos you can work them out but a point cloud does need them. > > billroberts: the structure of an observation is very useful for > datacube way > > jtandy: index space querying , natural coord subsetting, more > work to do here... > > phila: what proportion of coverage data is on a regular grid? > ... I am thinking of those with only 2 or 3 lines with regular > definition and you can work the rest out > ... in such cases a template uri could be generated that does > identifiy a "slice" > ... so we could say "of you have a regular grid pattern this is > how you generate the uri template" > > <Zakim> jtandy, you wanted to respond to phila's question about > regular grids > > jtandy: yes it is a large fraction by volume and number of > datasets, eg satellite imagery, > ... but there are other important cases such as in-situ > observations by radiospondes or buoys or gliders irregular > coverages happen more (like opendap/netcdf index-based > subsetting) > > eparsons: aggrees. my meta-question is , where is the stuff > with a more semantic approach -- are we just reinventing the > wheel of tools in other places? > > jtandy: phil had said data is easy, metadata is challenge. the > metadata is the bit to get the advantage of linked data such as > what you are measuring etc > ... metadata as linked data is key, then something else for > dense arrays of data > > eparsons: so lets not get worked up on data size then as ther > are other approaches > > billroberts: index array of data is good approah for some > stuff, but we need to think harder about others > > Jeremy Stole our time slot > > billroberts: jeremy stole out timeslot > ... we had been proposing to follow the main group for time > changes > > ... > >
Received on Thursday, 24 March 2016 08:21:33 UTC