W3C home > Mailing lists > Public > public-dwbp-comments@w3.org > January 2016

Re: Subsetting data

From: Phil Archer <phila@w3.org>
Date: Fri, 1 Jan 2016 09:38:48 +0000
To: Dan Brickley <danbri@google.com>, Clemens Portele <portele@interactive-instruments.de>, Rob Atkinson <rob@metalinkage.com.au>
Cc: Simon Cox <Simon.Cox@csiro.au>, amgreiner@lbl.gov, ericphb@gmail.com, jeremy.tandy@metoffice.gov.uk, koubarak@di.uoa.gr, public-dwbp-comments@w3.org, public-sdw-comments@w3.org
Message-ID: <56864928.9080507@w3.org>

On 31/12/2015 11:09, Dan Brickley wrote:
> Isn't a "subset" just a query result, or which there are effectively an
> unlimited number?


> Storing a query so it can be re-run against evolving data has value. Having
> a URI for that, perhaps less so.

http://xmlns.com/foaf/spec/ for example?

That's a very stable URI for an evolving document/dataset that many 
people find useful ;-)


> Dan
> On Thu, 31 Dec 2015, 08:14 Clemens Portele <
> portele@interactive-instruments.de> wrote:
>> Rob,
>> what you describe seems to apply to the dataset (resource) the same way it
>> would apply to any subset resource. I.e. are you discussing a more general
>> question, not the subsetting question?
>> Phil,
>> a (probably often unproblematic) restriction to the temperature/uk/london
>> or stations/manchester approach is that there is only one path, so you end
>> up with limitations on the subsets. If you want to support multiple
>> subsets, e.g. also stations where high speed trains stop, stations that
>> have a ticket shop, etc. then there are several issues with a
>> /{dataset}/{subset}/…/{subset}/{object} approach. These include an unclear
>> URI scheme ("manchester" and "eurostar" would be on the same path level),
>> potential name collisions of subset names of different subsetting
>> categories, and multiple URIs for the same feature/object.
>> Best regards,
>> Clemens
>> On 31 Dec 2015, at 03:07, Rob Atkinson <rob@metalinkage.com.au> wrote:
>> I'm not a strong set-theoretician - but it strikes me there are some
>> tensions here:
>> Does the identifier of a set mean that the members of that set are
>> constant, known in advance and always retrievable?   Is a query endpoint a
>> resource (does either URI or URL have meaning against a query that delivers
>> real time data - including the use case of "at this point in time we think
>> these things are members of this set?" )
>> If the subset is the result of a query - and you care that it is the same
>> subset another time you look at it - are you actually assigning an
>> identifier to the artefact - which is the query response, whose properties
>> include the original query, where it was made, and the time it was made?
>> Can you define an ontology for terms like subset, query, response that you
>> all agree on?
>> I share Phil's implicit concern that subsetting by type with URI patterns
>> may not be universally applicable - IMHO that equates to a "sub-register"
>> pattern, where a set has its members defined by some identifiable process
>> (indepent of any query functions available) - which may include explicit
>> subsets - for example by object type, or delegated registration processes.
>> That probably fits the UK implementation better than a query-defined
>> subset.
>> If subsets have some prior meaning - and a query is used to access then
>> from a service endpint - then the query is a URL that needs to be bound to
>> the object URI. AFAICT thats a very different thing to saying an arbitrary
>> query result defines a subset of data.
>> I think you may, in general, assign an ID to the artefact which is the
>> result of a query at a given time, and if you want to make that into
>> something with more semantics then you need make it into a new type of
>> object which can be described in terms of what it means. I think currently
>> the conversation is conflating these two perspectives of "subset".
>> Cheers, and farewell to 2015.
>> Rob Atkinson.
>> On Thu, 31 Dec 2015 at 08:26 <Simon.Cox@csiro.au> wrote:
>>> Another way of looking at it is that a query, encoded as a URI pattern,
>>> defines an implicit set of potential URIs, each of which denotes a subset.
>>> Simon J D Cox
>>> Environmental Informatics
>>> CSIRO Land and Water
>>> E simon.cox@csiro.au T +61 3 9545 2365 M +61 403 302 672
>>> Physical: Central Reception, Bayview Avenue, Clayton, Vic 3168
>>> Deliveries: Gate 3, Normanby Road, Clayton, Vic 3168
>>> Postal: Private Bag 10, Clayton South, Vic 3169
>>> http://people.csiro.au/Simon-Cox
>>> http://orcid.org/0000-0002-3884-3420
>>> http://researchgate.net/profile/Simon_Cox3
>>> ------------------------------
>>> *From:* Phil Archer
>>> *Sent:* Wednesday, 30 December 2015 6:31:16 PM
>>> *To:* Manolis Koubarakis; 'public-sdw-comments@w3.org'; Annette Greiner;
>>> Eric Stephan; Tandy, Jeremy; public-dwbp-comments@w3.org
>>> *Subject:* Subsetting data
>>> At various times in recent months I have promised to look into the topic
>>> of persistent identifiers for subsets of data. This came up at the SDW
>>> F2F in Sapporo but has also been raised by Annette in DWBP. In between
>>> festive activities I've been giving this some thought and have tried to
>>> begin to commit some ideas to a page [1].
>>> During the CEO-LD meeting, Jeremy pointed to OpenSearch as a possible
>>> way forward, including its geo-temporal extensions defined by the OGC.
>>> There is also the Linked Data API as a means of doing this, and what
>>> they both have in common is that they offer an intermediate layer that
>>> turns a URL into a query.
>>> How do you define a persistent identifier for a subset of a dataset? IMO
>>> you mint a URI and say "this identifies a subset of a dataset" - and
>>> then provide a means of programmatically going from the URI to a query
>>> that returns the subset. As long as you can replace the intermediate
>>> layer with another one that also returns the same subset, we're done.
>>> The UK Government Linked Data examples tend to be along the lines of:
>>> http://transport.data.gov.uk/id/stations
>>> returns a list of all stations in Britain.
>>> http://transport.data.gov.uk/id/stations/Manchester
>>> returns a list of stations in Manchester
>>> http://transport.data.gov.uk/id/stations/Manchester/Piccadilly
>>> identifies Manchester Piccadilly station.
>>> All of that data of course comes from a single dataset.
>>> Does this work in the real worlds of meteorology and UBL/PNNL?
>>> Phil.
>>> [1] https://github.com/w3c/sdw/blob/gh-pages/subsetting/index.md
>>> --
>>> Phil Archer
>>> W3C Data Activity Lead
>>> http://www.w3.org/2013/data/
>>> http://philarcher.org
>>> +44 (0)7887 767755
>>> @philarcher1


Phil Archer
W3C Data Activity Lead

+44 (0)7887 767755
Received on Friday, 1 January 2016 09:38:10 UTC

This archive was generated by hypermail 2.3.1 : Friday, 1 January 2016 09:38:10 UTC