RE: Subsetting data

Another way of looking at it is that a query, encoded as a URI pattern, defines an implicit set of potential URIs, each of which denotes a subset.

Simon J D Cox
Environmental Informatics
CSIRO Land and Water

E simon.cox@csiro.au T +61 3 9545 2365 M +61 403 302 672
Physical: Central Reception, Bayview Avenue, Clayton, Vic 3168
Deliveries: Gate 3, Normanby Road, Clayton, Vic 3168
Postal: Private Bag 10, Clayton South, Vic 3169
http://people.csiro.au/Simon-Cox
http://orcid.org/0000-0002-3884-3420
http://researchgate.net/profile/Simon_Cox3

________________________________
From: Phil Archer
Sent: Wednesday, 30 December 2015 6:31:16 PM
To: Manolis Koubarakis; 'public-sdw-comments@w3.org'; Annette Greiner; Eric Stephan; Tandy, Jeremy; public-dwbp-comments@w3.org
Subject: Subsetting data

At various times in recent months I have promised to look into the topic
of persistent identifiers for subsets of data. This came up at the SDW
F2F in Sapporo but has also been raised by Annette in DWBP. In between
festive activities I've been giving this some thought and have tried to
begin to commit some ideas to a page [1].

During the CEO-LD meeting, Jeremy pointed to OpenSearch as a possible
way forward, including its geo-temporal extensions defined by the OGC.
There is also the Linked Data API as a means of doing this, and what
they both have in common is that they offer an intermediate layer that
turns a URL into a query.

How do you define a persistent identifier for a subset of a dataset? IMO
you mint a URI and say "this identifies a subset of a dataset" - and
then provide a means of programmatically going from the URI to a query
that returns the subset. As long as you can replace the intermediate
layer with another one that also returns the same subset, we're done.

The UK Government Linked Data examples tend to be along the lines of:

http://transport.data.gov.uk/id/stations
returns a list of all stations in Britain.

http://transport.data.gov.uk/id/stations/Manchester
returns a list of stations in Manchester

http://transport.data.gov.uk/id/stations/Manchester/Piccadilly
identifies Manchester Piccadilly station.

All of that data of course comes from a single dataset.

Does this work in the real worlds of meteorology and UBL/PNNL?

Phil.




[1] https://github.com/w3c/sdw/blob/gh-pages/subsetting/index.md




--


Phil Archer
W3C Data Activity Lead
http://www.w3.org/2013/data/

http://philarcher.org
+44 (0)7887 767755
@philarcher1

Received on Wednesday, 30 December 2015 21:27:30 UTC