W3C home > Mailing lists > Public > public-dwbp-wg@w3.org > March 2016

Re: Subsetting BP

From: Eric Stephan <ericphb@gmail.com>
Date: Wed, 23 Mar 2016 16:42:39 -0700
Message-ID: <CAMFz4jhHxkUidecnhs1A27vrNuBkvxF_-SbmTe70wA-+L_M1Sw@mail.gmail.com>
To: Phil Archer <phila@w3.org>
Cc: Public DWBP WG <public-dwbp-wg@w3.org>
Very interesting discussion.  I re-read Annette's BP write up and one
aspect that caught my eye was the phrase "Subsetting approaches should aim
for a high ratio of needed data to unneeded data for the largest number of

Based on recent discussions subsetting generally speaking isn't simply
saying I want a smaller amount of data, its making a very specific request,
based on
* marking up the data (could mean indexing, )
* Using an API, slices, query or some data abstracted structure.
* Based on Jeremy's observations about bulk download, I would also expect
as a best practice that a subsetting implementation would be interactive
and take less time from request to result than downloading the entire

My only concern about an example is that it has the widest appeal possible.
I like sticking to the the single bus route example seems to hit that mark.

Regarding the SDW time, okay that shouldn't be a show stopper (I can dream
of a future mid day W3C meeting can't I?), I am a bit torn over that or
seeing what is going on with the RDF data shapes WG.

Kind regards,

Eric S

On Wed, Mar 23, 2016 at 3:33 PM, Phil Archer <phila@w3.org> wrote:

> Thanks Eric,
> Newton and Bernadette were able to join us and we had a useful discussion
> about subsetting. The minutes are at
> https://www.w3.org/2016/03/23-sdwcov-minutes. My understanding was that,
> as we discussed in our own call earlier, the difficulty is that it is
> almost impossible to talk about this in the abstract.
> Jeremy Tandy said: it makes sense to for dwbp to provide some advice -- if
>     you have data that is too big for a web application then
>     providew a mechanism to get hold of bits of it
>     ... eg. using predefined slices or an API
>     ... test by "here is a massive dataset -- can you work with it
>     in a browser app?
> So my understanding - and it is no more than my understanding which may be
> inaccurate - is that there is agreement on:
> - bulk download is a BP, meaning, you should make all the data available
> for download, probably not in real time, for local processing.
> - If the dataset is large, it's a good idea to make subsets available,
> which can be done through an API and/or through defining subsets and giving
> them identifiers.
> - What that API looks like, or how to construct those URIs is always going
> to be specific to the dataset.
> What is not clear is whether we can create a genuine BP around this.
> Newton (rightly) asks how you can test it. Jeremy suggested - but it was
> in the hoof and shouldn't be taken as gospel - that a test might be whether
> the dataset is processable within a browser. Today's browsers can handle
> around 40MB without breaking into a sweat - 10 years ago, 1 MB might have
> caused problems, so the test advances with time nicely.
> IMHO, what Annette wrote is right (or very close to it), and the single
> bus route example is a good one; but I know we haven't reached a consensus
> view.
> We could readily add in another example and could, perhaps, explicitly
> talk about spatial coverages, payments data, and statistics as examples of
> datasets that can be very large but for which many applications only ever
> want a subset.
> On your question about regular time slots, no, the time is about to
> change. The switch to DST in the northern hemisphere and away from it in
> the south means SDW is about to switch time slots. I can advise when the
> new time has been decided, but it's likely to be between 6 and 8 am your
> time.
> Phil.
> Phil
> On 23/03/2016 21:04, Eric Stephan wrote:
>> Phil,
>> I  just saw this note, thanks for reaching out, it would have been nice to
>> participate.   If this is a reoccurring meeting time I'd like to
>> participate especially with the DUV activities winding down.
>> Kind regards,
>> Eric S
>> On Wed, Mar 23, 2016 at 9:13 AM, Phil Archer <phila@w3.org> wrote:
>> Just to let DWBP folks know that the Subsetting BP [1] is on the agenda
>>> for one of the Spatial data WG's sub group calls which takes place at
>>> 20:00
>>> UTC today (13:00 for Annette and Eric, 20:00 UK, 21:00 CET).
>>> I dare say that Bill Roberts, chair of that subgroup, would be happy for
>>> anyone in DWBP who wishes to join that call. Details at [2].
>>> Legal disclaimer:
>>> Please note that the SDW WG is run jointly with the OGC and therefore the
>>> output will be a joint OGC/W3C specification. In addition to the usual
>>> W3C
>>> rules, the (almost exactly the same) rules apply for OGC, it's just
>>> handled
>>> differently, See [3].
>>> Phil.
>>> [1] http://w3c.github.io/dwbp/bp.html#EnableDataSubsetting
>>> [2]
>>> https://www.w3.org/2015/spatial/wiki/Meetings:Coverage-Telecon20160323
>>> [3] https://www.w3.org/2015/spatial/wiki/Patent_Call
>>> --
>>> Phil Archer
>>> W3C Data Activity Lead
>>> http://www.w3.org/2013/data/
>>> http://philarcher.org
>>> +44 (0)7887 767755
>>> @philarcher1
> --
> Phil Archer
> W3C Data Activity Lead
> http://www.w3.org/2013/data/
> http://philarcher.org
> +44 (0)7887 767755
> @philarcher1
Received on Wednesday, 23 March 2016 23:43:07 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 23 March 2016 23:43:07 UTC