- From: Eric Stephan <ericphb@gmail.com>
- Date: Wed, 23 Mar 2016 16:42:39 -0700
- To: Phil Archer <phila@w3.org>
- Cc: Public DWBP WG <public-dwbp-wg@w3.org>
- Message-ID: <CAMFz4jhHxkUidecnhs1A27vrNuBkvxF_-SbmTe70wA-+L_M1Sw@mail.gmail.com>
Very interesting discussion. I re-read Annette's BP write up and one aspect that caught my eye was the phrase "Subsetting approaches should aim for a high ratio of needed data to unneeded data for the largest number of users." Based on recent discussions subsetting generally speaking isn't simply saying I want a smaller amount of data, its making a very specific request, based on * marking up the data (could mean indexing, ) * Using an API, slices, query or some data abstracted structure. * Based on Jeremy's observations about bulk download, I would also expect as a best practice that a subsetting implementation would be interactive and take less time from request to result than downloading the entire dataset/distribution. My only concern about an example is that it has the widest appeal possible. I like sticking to the the single bus route example seems to hit that mark. Regarding the SDW time, okay that shouldn't be a show stopper (I can dream of a future mid day W3C meeting can't I?), I am a bit torn over that or seeing what is going on with the RDF data shapes WG. Kind regards, Eric S On Wed, Mar 23, 2016 at 3:33 PM, Phil Archer <phila@w3.org> wrote: > Thanks Eric, > > Newton and Bernadette were able to join us and we had a useful discussion > about subsetting. The minutes are at > https://www.w3.org/2016/03/23-sdwcov-minutes. My understanding was that, > as we discussed in our own call earlier, the difficulty is that it is > almost impossible to talk about this in the abstract. > > Jeremy Tandy said: it makes sense to for dwbp to provide some advice -- if > you have data that is too big for a web application then > providew a mechanism to get hold of bits of it > ... eg. using predefined slices or an API > ... test by "here is a massive dataset -- can you work with it > in a browser app? > > So my understanding - and it is no more than my understanding which may be > inaccurate - is that there is agreement on: > > - bulk download is a BP, meaning, you should make all the data available > for download, probably not in real time, for local processing. > > - If the dataset is large, it's a good idea to make subsets available, > which can be done through an API and/or through defining subsets and giving > them identifiers. > > - What that API looks like, or how to construct those URIs is always going > to be specific to the dataset. > > What is not clear is whether we can create a genuine BP around this. > > Newton (rightly) asks how you can test it. Jeremy suggested - but it was > in the hoof and shouldn't be taken as gospel - that a test might be whether > the dataset is processable within a browser. Today's browsers can handle > around 40MB without breaking into a sweat - 10 years ago, 1 MB might have > caused problems, so the test advances with time nicely. > > IMHO, what Annette wrote is right (or very close to it), and the single > bus route example is a good one; but I know we haven't reached a consensus > view. > > We could readily add in another example and could, perhaps, explicitly > talk about spatial coverages, payments data, and statistics as examples of > datasets that can be very large but for which many applications only ever > want a subset. > > On your question about regular time slots, no, the time is about to > change. The switch to DST in the northern hemisphere and away from it in > the south means SDW is about to switch time slots. I can advise when the > new time has been decided, but it's likely to be between 6 and 8 am your > time. > > Phil. > > Phil > > > On 23/03/2016 21:04, Eric Stephan wrote: > >> Phil, >> >> I just saw this note, thanks for reaching out, it would have been nice to >> participate. If this is a reoccurring meeting time I'd like to >> participate especially with the DUV activities winding down. >> >> Kind regards, >> >> Eric S >> >> On Wed, Mar 23, 2016 at 9:13 AM, Phil Archer <phila@w3.org> wrote: >> >> Just to let DWBP folks know that the Subsetting BP [1] is on the agenda >>> for one of the Spatial data WG's sub group calls which takes place at >>> 20:00 >>> UTC today (13:00 for Annette and Eric, 20:00 UK, 21:00 CET). >>> >>> I dare say that Bill Roberts, chair of that subgroup, would be happy for >>> anyone in DWBP who wishes to join that call. Details at [2]. >>> >>> Legal disclaimer: >>> Please note that the SDW WG is run jointly with the OGC and therefore the >>> output will be a joint OGC/W3C specification. In addition to the usual >>> W3C >>> rules, the (almost exactly the same) rules apply for OGC, it's just >>> handled >>> differently, See [3]. >>> >>> Phil. >>> >>> [1] http://w3c.github.io/dwbp/bp.html#EnableDataSubsetting >>> [2] >>> https://www.w3.org/2015/spatial/wiki/Meetings:Coverage-Telecon20160323 >>> [3] https://www.w3.org/2015/spatial/wiki/Patent_Call >>> >>> -- >>> >>> >>> Phil Archer >>> W3C Data Activity Lead >>> http://www.w3.org/2013/data/ >>> >>> http://philarcher.org >>> +44 (0)7887 767755 >>> @philarcher1 >>> >>> >>> >> > -- > > > Phil Archer > W3C Data Activity Lead > http://www.w3.org/2013/data/ > > http://philarcher.org > +44 (0)7887 767755 > @philarcher1 >
Received on Wednesday, 23 March 2016 23:43:07 UTC