Re: Subsetting BP

Hi all,

The discussion with the SDW time was very good! I agree with Phil that
Annette's proposal is right, but I think we should change a little bit the
focus of the BP. IMO the BP should be about "Provide data subsets for large
datasets" instead of "Enabling subsetting". Doing this, we can test if it
is possible to retrieve or not large datasets.

kind regards,
Bernadette

2016-03-23 23:30 GMT-03:00 Eric Stephan <ericphb@gmail.com>:

> Annette,
>
> +1
>
> From the UX perspective (web browser + smart phone app) other measures
> might be the:
>
>  * number of consumer processing steps reduced through subsetting (
> compared to collecting the entire dataset)
>
>  * resulting size of memory footprint provided by subsetting versus
> footprint required by dataset.
>
> Kind regards,
>
> Eric S
>
>
> On Wed, Mar 23, 2016 at 4:45 PM, Annette Greiner <amgreiner@lbl.gov>
> wrote:
>
>> I like the idea of a test that updates over time like that. I think a
>> goal could be to enable a web app to pull down each bit of data within 10
>> seconds on a consumer-level network. Ten seconds is a rule of thumb in UX
>> circles for what feels like a reasonable time to wait for an app to
>> respond, *if* the user is given an indicator that the app is working. That
>> would still make sense over time as networks get faster.
>> -Annette
>>
>>
>> On 3/23/16 3:33 PM, Phil Archer wrote:
>>
>>> Thanks Eric,
>>>
>>> Newton and Bernadette were able to join us and we had a useful
>>> discussion about subsetting. The minutes are at
>>> https://www.w3.org/2016/03/23-sdwcov-minutes. My understanding was
>>> that, as we discussed in our own call earlier, the difficulty is that it is
>>> almost impossible to talk about this in the abstract.
>>>
>>> Jeremy Tandy said: it makes sense to for dwbp to provide some advice --
>>> if
>>>     you have data that is too big for a web application then
>>>     providew a mechanism to get hold of bits of it
>>>     ... eg. using predefined slices or an API
>>>     ... test by "here is a massive dataset -- can you work with it
>>>     in a browser app?
>>>
>>> So my understanding - and it is no more than my understanding which may
>>> be inaccurate - is that there is agreement on:
>>>
>>> - bulk download is a BP, meaning, you should make all the data available
>>> for download, probably not in real time, for local processing.
>>>
>>> - If the dataset is large, it's a good idea to make subsets available,
>>> which can be done through an API and/or through defining subsets and giving
>>> them identifiers.
>>>
>>> - What that API looks like, or how to construct those URIs is always
>>> going to be specific to the dataset.
>>>
>>> What is not clear is whether we can create a genuine BP around this.
>>>
>>> Newton (rightly) asks how you can test it. Jeremy suggested - but it was
>>> in the hoof and shouldn't be taken as gospel - that a test might be whether
>>> the dataset is processable within a browser. Today's browsers can handle
>>> around 40MB without breaking into a sweat - 10 years ago, 1 MB might have
>>> caused problems, so the test advances with time nicely.
>>>
>>> IMHO, what Annette wrote is right (or very close to it), and the single
>>> bus route example is a good one; but I know we haven't reached a consensus
>>> view.
>>>
>>> We could readily add in another example and could, perhaps, explicitly
>>> talk about spatial coverages, payments data, and statistics as examples of
>>> datasets that can be very large but for which many applications only ever
>>> want a subset.
>>>
>>> On your question about regular time slots, no, the time is about to
>>> change. The switch to DST in the northern hemisphere and away from it in
>>> the south means SDW is about to switch time slots. I can advise when the
>>> new time has been decided, but it's likely to be between 6 and 8 am your
>>> time.
>>>
>>> Phil.
>>>
>>> Phil
>>>
>>> On 23/03/2016 21:04, Eric Stephan wrote:
>>>
>>>> Phil,
>>>>
>>>> I  just saw this note, thanks for reaching out, it would have been nice
>>>> to
>>>> participate.   If this is a reoccurring meeting time I'd like to
>>>> participate especially with the DUV activities winding down.
>>>>
>>>> Kind regards,
>>>>
>>>> Eric S
>>>>
>>>> On Wed, Mar 23, 2016 at 9:13 AM, Phil Archer <phila@w3.org> wrote:
>>>>
>>>> Just to let DWBP folks know that the Subsetting BP [1] is on the agenda
>>>>> for one of the Spatial data WG's sub group calls which takes place at
>>>>> 20:00
>>>>> UTC today (13:00 for Annette and Eric, 20:00 UK, 21:00 CET).
>>>>>
>>>>> I dare say that Bill Roberts, chair of that subgroup, would be happy
>>>>> for
>>>>> anyone in DWBP who wishes to join that call. Details at [2].
>>>>>
>>>>> Legal disclaimer:
>>>>> Please note that the SDW WG is run jointly with the OGC and therefore
>>>>> the
>>>>> output will be a joint OGC/W3C specification. In addition to the usual
>>>>> W3C
>>>>> rules, the (almost exactly the same) rules apply for OGC, it's just
>>>>> handled
>>>>> differently, See [3].
>>>>>
>>>>> Phil.
>>>>>
>>>>> [1] http://w3c.github.io/dwbp/bp.html#EnableDataSubsetting
>>>>> [2]
>>>>> https://www.w3.org/2015/spatial/wiki/Meetings:Coverage-Telecon20160323
>>>>> [3] https://www.w3.org/2015/spatial/wiki/Patent_Call
>>>>>
>>>>> --
>>>>>
>>>>>
>>>>> Phil Archer
>>>>> W3C Data Activity Lead
>>>>> http://www.w3.org/2013/data/
>>>>>
>>>>> http://philarcher.org
>>>>> +44 (0)7887 767755
>>>>> @philarcher1
>>>>>
>>>>>
>>>>>
>>>>
>>>
>> --
>> Annette Greiner
>> NERSC Data and Analytics Services
>> Lawrence Berkeley National Laboratory
>>
>>
>>
>


-- 
Bernadette Farias Lóscio
Centro de Informática
Universidade Federal de Pernambuco - UFPE, Brazil
----------------------------------------------------------------------------

Received on Thursday, 24 March 2016 14:41:53 UTC