W3C home > Mailing lists > Public > public-dwbp-comments@w3.org > January 2016

RE: Subsetting data

From: <Simon.Cox@csiro.au>
Date: Sat, 2 Jan 2016 00:10:23 +0000
To: <phila@w3.org>, <Simon.Cox@csiro.au>, <public-sdw-comments@w3.org>, <public-dwbp-comments@w3.org>
Message-ID: <2A7346E8D9F62D4CA8D78387173A054A60341640@exmbx04-cdc.nexus.csiro.au>
> to be persistent, identifiers should not include queries against a specific API or query endpoint.

For sure. I didn't say anything about the form of the query. It may not even look like a query. Opensearch is an obvious model for implementation-independent syntax (after all it's just key-value pairs).

However, I do think it is worth keeping the notion of subset=query result in view. Sure, some query results may be more persistent and therefore worthy of denotation with a special identifier. But the same subset will also be the result of some query anyway. That's just an example of non-unique identifiers.

Simon J D Cox

Research Scientist

Environmental Information Infrastructures

Land and Water

CSIRO



E simon.cox@csiro.au T +61 3 9545 2365 M +61 403 302 672

   Physical: Reception Central, Bayview Avenue, Clayton, Vic 3168

   Deliveries: Gate 3, Normanby Road, Clayton, Vic 3168

   Postal: Private Bag 10, Clayton South, Vic 3169

people.csiro.au/Simon-Cox

orcid.org/0000-0002-3884-3420

researchgate.net/profile/Simon_Cox3




________________________________
From: Phil Archer
Sent: Friday, 1 January 2016 9:05:25 AM
To: Cox, Simon (L&W, Clayton); public-sdw-comments@w3.org; public-dwbp-comments@w3.org
Subject: Re: Subsetting data



On 30/12/2015 21:26, Simon.Cox@csiro.au wrote:
> Another way of looking at it is that a query, encoded as a URI pattern, defines an implicit set of potential URIs, each of which denotes a subset.

True, but to be persistent, identifiers should not include queries
against a specific API or query endpoint. That, for me, is the key
point. OpenSearch provides a model where a query is included in a URL
that can be considered persistent because there is a layer of
indirection that could be changed without the URL changing, but a URL
that includes a SQL or SPARQL query directly must be considered
ephemeral IMO.

Phil


>
> Simon J D Cox
> Environmental Informatics
> CSIRO Land and Water
>
> E simon.cox@csiro.au T +61 3 9545 2365 M +61 403 302 672
> Physical: Central Reception, Bayview Avenue, Clayton, Vic 3168
> Deliveries: Gate 3, Normanby Road, Clayton, Vic 3168
> Postal: Private Bag 10, Clayton South, Vic 3169
> http://people.csiro.au/Simon-Cox
> http://orcid.org/0000-0002-3884-3420
> http://researchgate.net/profile/Simon_Cox3
>
> ________________________________
> From: Phil Archer
> Sent: Wednesday, 30 December 2015 6:31:16 PM
> To: Manolis Koubarakis; 'public-sdw-comments@w3.org'; Annette Greiner; Eric Stephan; Tandy, Jeremy; public-dwbp-comments@w3.org
> Subject: Subsetting data
>
> At various times in recent months I have promised to look into the topic
> of persistent identifiers for subsets of data. This came up at the SDW
> F2F in Sapporo but has also been raised by Annette in DWBP. In between
> festive activities I've been giving this some thought and have tried to
> begin to commit some ideas to a page [1].
>
> During the CEO-LD meeting, Jeremy pointed to OpenSearch as a possible
> way forward, including its geo-temporal extensions defined by the OGC.
> There is also the Linked Data API as a means of doing this, and what
> they both have in common is that they offer an intermediate layer that
> turns a URL into a query.
>
> How do you define a persistent identifier for a subset of a dataset? IMO
> you mint a URI and say "this identifies a subset of a dataset" - and
> then provide a means of programmatically going from the URI to a query
> that returns the subset. As long as you can replace the intermediate
> layer with another one that also returns the same subset, we're done.
>
> The UK Government Linked Data examples tend to be along the lines of:
>
> http://transport.data.gov.uk/id/stations
> returns a list of all stations in Britain.
>
> http://transport.data.gov.uk/id/stations/Manchester
> returns a list of stations in Manchester
>
> http://transport.data.gov.uk/id/stations/Manchester/Piccadilly
> identifies Manchester Piccadilly station.
>
> All of that data of course comes from a single dataset.
>
> Does this work in the real worlds of meteorology and UBL/PNNL?
>
> Phil.
>
>
>
>
> [1] https://github.com/w3c/sdw/blob/gh-pages/subsetting/index.md
>
>
>
>
> --
>
>
> Phil Archer
> W3C Data Activity Lead
> http://www.w3.org/2013/data/
>
> http://philarcher.org
> +44 (0)7887 767755
> @philarcher1
>
>

--


Phil Archer
W3C Data Activity Lead
http://www.w3.org/2013/data/

http://philarcher.org
+44 (0)7887 767755
@philarcher1
Received on Saturday, 2 January 2016 00:11:15 UTC

This archive was generated by hypermail 2.3.1 : Saturday, 2 January 2016 00:11:16 UTC