W3C home > Mailing lists > Public > public-dwbp-wg@w3.org > March 2016

Re: Subsetting BP

From: Eric Stephan <ericphb@gmail.com>
Date: Wed, 23 Mar 2016 16:42:39 -0700
Message-ID: <CAMFz4jhHxkUidecnhs1A27vrNuBkvxF_-SbmTe70wA-+L_M1Sw@mail.gmail.com>
To: Phil Archer <phila@w3.org>
Cc: Public DWBP WG <public-dwbp-wg@w3.org>
Very interesting discussion.  I re-read Annette's BP write up and one
aspect that caught my eye was the phrase "Subsetting approaches should aim
for a high ratio of needed data to unneeded data for the largest number of
users."

Based on recent discussions subsetting generally speaking isn't simply
saying I want a smaller amount of data, its making a very specific request,
based on
* marking up the data (could mean indexing, )
* Using an API, slices, query or some data abstracted structure.
* Based on Jeremy's observations about bulk download, I would also expect
as a best practice that a subsetting implementation would be interactive
and take less time from request to result than downloading the entire
dataset/distribution.

My only concern about an example is that it has the widest appeal possible.
I like sticking to the the single bus route example seems to hit that mark.

Regarding the SDW time, okay that shouldn't be a show stopper (I can dream
of a future mid day W3C meeting can't I?), I am a bit torn over that or
seeing what is going on with the RDF data shapes WG.

Kind regards,

Eric S


On Wed, Mar 23, 2016 at 3:33 PM, Phil Archer <phila@w3.org> wrote:

> Thanks Eric,
>
> Newton and Bernadette were able to join us and we had a useful discussion
> about subsetting. The minutes are at
> https://www.w3.org/2016/03/23-sdwcov-minutes. My understanding was that,
> as we discussed in our own call earlier, the difficulty is that it is
> almost impossible to talk about this in the abstract.
>
> Jeremy Tandy said: it makes sense to for dwbp to provide some advice -- if
>     you have data that is too big for a web application then
>     providew a mechanism to get hold of bits of it
>     ... eg. using predefined slices or an API
>     ... test by "here is a massive dataset -- can you work with it
>     in a browser app?
>
> So my understanding - and it is no more than my understanding which may be
> inaccurate - is that there is agreement on:
>
> - bulk download is a BP, meaning, you should make all the data available
> for download, probably not in real time, for local processing.
>
> - If the dataset is large, it's a good idea to make subsets available,
> which can be done through an API and/or through defining subsets and giving
> them identifiers.
>
> - What that API looks like, or how to construct those URIs is always going
> to be specific to the dataset.
>
> What is not clear is whether we can create a genuine BP around this.
>
> Newton (rightly) asks how you can test it. Jeremy suggested - but it was
> in the hoof and shouldn't be taken as gospel - that a test might be whether
> the dataset is processable within a browser. Today's browsers can handle
> around 40MB without breaking into a sweat - 10 years ago, 1 MB might have
> caused problems, so the test advances with time nicely.
>
> IMHO, what Annette wrote is right (or very close to it), and the single
> bus route example is a good one; but I know we haven't reached a consensus
> view.
>
> We could readily add in another example and could, perhaps, explicitly
> talk about spatial coverages, payments data, and statistics as examples of
> datasets that can be very large but for which many applications only ever
> want a subset.
>
> On your question about regular time slots, no, the time is about to
> change. The switch to DST in the northern hemisphere and away from it in
> the south means SDW is about to switch time slots. I can advise when the
> new time has been decided, but it's likely to be between 6 and 8 am your
> time.
>
> Phil.
>
> Phil
>
>
> On 23/03/2016 21:04, Eric Stephan wrote:
>
>> Phil,
>>
>> I  just saw this note, thanks for reaching out, it would have been nice to
>> participate.   If this is a reoccurring meeting time I'd like to
>> participate especially with the DUV activities winding down.
>>
>> Kind regards,
>>
>> Eric S
>>
>> On Wed, Mar 23, 2016 at 9:13 AM, Phil Archer <phila@w3.org> wrote:
>>
>> Just to let DWBP folks know that the Subsetting BP [1] is on the agenda
>>> for one of the Spatial data WG's sub group calls which takes place at
>>> 20:00
>>> UTC today (13:00 for Annette and Eric, 20:00 UK, 21:00 CET).
>>>
>>> I dare say that Bill Roberts, chair of that subgroup, would be happy for
>>> anyone in DWBP who wishes to join that call. Details at [2].
>>>
>>> Legal disclaimer:
>>> Please note that the SDW WG is run jointly with the OGC and therefore the
>>> output will be a joint OGC/W3C specification. In addition to the usual
>>> W3C
>>> rules, the (almost exactly the same) rules apply for OGC, it's just
>>> handled
>>> differently, See [3].
>>>
>>> Phil.
>>>
>>> [1] http://w3c.github.io/dwbp/bp.html#EnableDataSubsetting
>>> [2]
>>> https://www.w3.org/2015/spatial/wiki/Meetings:Coverage-Telecon20160323
>>> [3] https://www.w3.org/2015/spatial/wiki/Patent_Call
>>>
>>> --
>>>
>>>
>>> Phil Archer
>>> W3C Data Activity Lead
>>> http://www.w3.org/2013/data/
>>>
>>> http://philarcher.org
>>> +44 (0)7887 767755
>>> @philarcher1
>>>
>>>
>>>
>>
> --
>
>
> Phil Archer
> W3C Data Activity Lead
> http://www.w3.org/2013/data/
>
> http://philarcher.org
> +44 (0)7887 767755
> @philarcher1
>
Received on Wednesday, 23 March 2016 23:43:07 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 23 March 2016 23:43:07 UTC