W3C home > Mailing lists > Public > public-dwbp-wg@w3.org > March 2016

Re: Subsetting BP

From: Phil Archer <phila@w3.org>
Date: Wed, 23 Mar 2016 22:33:58 +0000
To: Eric Stephan <ericphb@gmail.com>, Public DWBP WG <public-dwbp-wg@w3.org>
Message-ID: <56F319D6.9060601@w3.org>
Thanks Eric,

Newton and Bernadette were able to join us and we had a useful 
discussion about subsetting. The minutes are at 
https://www.w3.org/2016/03/23-sdwcov-minutes. My understanding was that, 
as we discussed in our own call earlier, the difficulty is that it is 
almost impossible to talk about this in the abstract.

Jeremy Tandy said: it makes sense to for dwbp to provide some advice -- if
     you have data that is too big for a web application then
     providew a mechanism to get hold of bits of it
     ... eg. using predefined slices or an API
     ... test by "here is a massive dataset -- can you work with it
     in a browser app?

So my understanding - and it is no more than my understanding which may 
be inaccurate - is that there is agreement on:

- bulk download is a BP, meaning, you should make all the data available 
for download, probably not in real time, for local processing.

- If the dataset is large, it's a good idea to make subsets available, 
which can be done through an API and/or through defining subsets and 
giving them identifiers.

- What that API looks like, or how to construct those URIs is always 
going to be specific to the dataset.

What is not clear is whether we can create a genuine BP around this.

Newton (rightly) asks how you can test it. Jeremy suggested - but it was 
in the hoof and shouldn't be taken as gospel - that a test might be 
whether the dataset is processable within a browser. Today's browsers 
can handle around 40MB without breaking into a sweat - 10 years ago, 1 
MB might have caused problems, so the test advances with time nicely.

IMHO, what Annette wrote is right (or very close to it), and the single 
bus route example is a good one; but I know we haven't reached a 
consensus view.

We could readily add in another example and could, perhaps, explicitly 
talk about spatial coverages, payments data, and statistics as examples 
of datasets that can be very large but for which many applications only 
ever want a subset.

On your question about regular time slots, no, the time is about to 
change. The switch to DST in the northern hemisphere and away from it in 
the south means SDW is about to switch time slots. I can advise when the 
new time has been decided, but it's likely to be between 6 and 8 am your 



On 23/03/2016 21:04, Eric Stephan wrote:
> Phil,
> I  just saw this note, thanks for reaching out, it would have been nice to
> participate.   If this is a reoccurring meeting time I'd like to
> participate especially with the DUV activities winding down.
> Kind regards,
> Eric S
> On Wed, Mar 23, 2016 at 9:13 AM, Phil Archer <phila@w3.org> wrote:
>> Just to let DWBP folks know that the Subsetting BP [1] is on the agenda
>> for one of the Spatial data WG's sub group calls which takes place at 20:00
>> UTC today (13:00 for Annette and Eric, 20:00 UK, 21:00 CET).
>> I dare say that Bill Roberts, chair of that subgroup, would be happy for
>> anyone in DWBP who wishes to join that call. Details at [2].
>> Legal disclaimer:
>> Please note that the SDW WG is run jointly with the OGC and therefore the
>> output will be a joint OGC/W3C specification. In addition to the usual W3C
>> rules, the (almost exactly the same) rules apply for OGC, it's just handled
>> differently, See [3].
>> Phil.
>> [1] http://w3c.github.io/dwbp/bp.html#EnableDataSubsetting
>> [2] https://www.w3.org/2015/spatial/wiki/Meetings:Coverage-Telecon20160323
>> [3] https://www.w3.org/2015/spatial/wiki/Patent_Call
>> --
>> Phil Archer
>> W3C Data Activity Lead
>> http://www.w3.org/2013/data/
>> http://philarcher.org
>> +44 (0)7887 767755
>> @philarcher1


Phil Archer
W3C Data Activity Lead

+44 (0)7887 767755
Received on Wednesday, 23 March 2016 22:34:12 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 23 March 2016 22:34:13 UTC