W3C home > Mailing lists > Public > public-dwbp-comments@w3.org > December 2015

Re: Subsetting data

From: Dan Brickley <danbri@google.com>
Date: Thu, 31 Dec 2015 11:09:08 +0000
Message-ID: <CAK-qy=5BkqrORAWuWxb8mFmW3-dd9Pucyjc3f-ZOy=0uF-gWDw@mail.gmail.com>
To: Clemens Portele <portele@interactive-instruments.de>, Rob Atkinson <rob@metalinkage.com.au>
Cc: Phil Archer <phila@w3.org>, Simon Cox <Simon.Cox@csiro.au>, amgreiner@lbl.gov, ericphb@gmail.com, jeremy.tandy@metoffice.gov.uk, koubarak@di.uoa.gr, public-dwbp-comments@w3.org, public-sdw-comments@w3.org
Isn't a "subset" just a query result, or which there are effectively an
unlimited number?

Storing a query so it can be re-run against evolving data has value. Having
a URI for that, perhaps less so.

Dan

On Thu, 31 Dec 2015, 08:14 Clemens Portele <
portele@interactive-instruments.de> wrote:

> Rob,
>
> what you describe seems to apply to the dataset (resource) the same way it
> would apply to any subset resource. I.e. are you discussing a more general
> question, not the subsetting question?
>
> Phil,
>
> a (probably often unproblematic) restriction to the temperature/uk/london
> or stations/manchester approach is that there is only one path, so you end
> up with limitations on the subsets. If you want to support multiple
> subsets, e.g. also stations where high speed trains stop, stations that
> have a ticket shop, etc. then there are several issues with a
> /{dataset}/{subset}/…/{subset}/{object} approach. These include an unclear
> URI scheme ("manchester" and "eurostar" would be on the same path level),
> potential name collisions of subset names of different subsetting
> categories, and multiple URIs for the same feature/object.
>
> Best regards,
> Clemens
>
>
> On 31 Dec 2015, at 03:07, Rob Atkinson <rob@metalinkage.com.au> wrote:
>
> I'm not a strong set-theoretician - but it strikes me there are some
> tensions here:
>
> Does the identifier of a set mean that the members of that set are
> constant, known in advance and always retrievable?   Is a query endpoint a
> resource (does either URI or URL have meaning against a query that delivers
> real time data - including the use case of "at this point in time we think
> these things are members of this set?" )
>
> If the subset is the result of a query - and you care that it is the same
> subset another time you look at it - are you actually assigning an
> identifier to the artefact - which is the query response, whose properties
> include the original query, where it was made, and the time it was made?
>
> Can you define an ontology for terms like subset, query, response that you
> all agree on?
>
> I share Phil's implicit concern that subsetting by type with URI patterns
> may not be universally applicable - IMHO that equates to a "sub-register"
> pattern, where a set has its members defined by some identifiable process
> (indepent of any query functions available) - which may include explicit
> subsets - for example by object type, or delegated registration processes.
> That probably fits the UK implementation better than a query-defined
> subset.
>
> If subsets have some prior meaning - and a query is used to access then
> from a service endpint - then the query is a URL that needs to be bound to
> the object URI. AFAICT thats a very different thing to saying an arbitrary
> query result defines a subset of data.
>
> I think you may, in general, assign an ID to the artefact which is the
> result of a query at a given time, and if you want to make that into
> something with more semantics then you need make it into a new type of
> object which can be described in terms of what it means. I think currently
> the conversation is conflating these two perspectives of "subset".
>
> Cheers, and farewell to 2015.
> Rob Atkinson.
>
>
>
>
> On Thu, 31 Dec 2015 at 08:26 <Simon.Cox@csiro.au> wrote:
>
>> Another way of looking at it is that a query, encoded as a URI pattern,
>> defines an implicit set of potential URIs, each of which denotes a subset.
>>
>> Simon J D Cox
>> Environmental Informatics
>> CSIRO Land and Water
>>
>> E simon.cox@csiro.au T +61 3 9545 2365 M +61 403 302 672
>> Physical: Central Reception, Bayview Avenue, Clayton, Vic 3168
>> Deliveries: Gate 3, Normanby Road, Clayton, Vic 3168
>> Postal: Private Bag 10, Clayton South, Vic 3169
>> http://people.csiro.au/Simon-Cox
>> http://orcid.org/0000-0002-3884-3420
>> http://researchgate.net/profile/Simon_Cox3
>>
>> ------------------------------
>> *From:* Phil Archer
>> *Sent:* Wednesday, 30 December 2015 6:31:16 PM
>> *To:* Manolis Koubarakis; 'public-sdw-comments@w3.org'; Annette Greiner;
>> Eric Stephan; Tandy, Jeremy; public-dwbp-comments@w3.org
>> *Subject:* Subsetting data
>>
>> At various times in recent months I have promised to look into the topic
>> of persistent identifiers for subsets of data. This came up at the SDW
>> F2F in Sapporo but has also been raised by Annette in DWBP. In between
>> festive activities I've been giving this some thought and have tried to
>> begin to commit some ideas to a page [1].
>>
>> During the CEO-LD meeting, Jeremy pointed to OpenSearch as a possible
>> way forward, including its geo-temporal extensions defined by the OGC.
>> There is also the Linked Data API as a means of doing this, and what
>> they both have in common is that they offer an intermediate layer that
>> turns a URL into a query.
>>
>> How do you define a persistent identifier for a subset of a dataset? IMO
>> you mint a URI and say "this identifies a subset of a dataset" - and
>> then provide a means of programmatically going from the URI to a query
>> that returns the subset. As long as you can replace the intermediate
>> layer with another one that also returns the same subset, we're done.
>>
>> The UK Government Linked Data examples tend to be along the lines of:
>>
>> http://transport.data.gov.uk/id/stations
>> returns a list of all stations in Britain.
>>
>> http://transport.data.gov.uk/id/stations/Manchester
>> returns a list of stations in Manchester
>>
>> http://transport.data.gov.uk/id/stations/Manchester/Piccadilly
>> identifies Manchester Piccadilly station.
>>
>> All of that data of course comes from a single dataset.
>>
>> Does this work in the real worlds of meteorology and UBL/PNNL?
>>
>> Phil.
>>
>>
>>
>>
>> [1] https://github.com/w3c/sdw/blob/gh-pages/subsetting/index.md
>>
>>
>>
>>
>> --
>>
>>
>> Phil Archer
>> W3C Data Activity Lead
>> http://www.w3.org/2013/data/
>>
>> http://philarcher.org
>> +44 (0)7887 767755
>> @philarcher1
>>
>>
>
Received on Thursday, 31 December 2015 11:09:46 UTC

This archive was generated by hypermail 2.3.1 : Thursday, 31 December 2015 11:09:47 UTC