W3C home > Mailing lists > Public > public-dwbp-comments@w3.org > January 2016

RE: Subsetting data

From: <Simon.Cox@csiro.au>
Date: Mon, 4 Jan 2016 10:44:25 +0000
To: <p.baumann@jacobs-university.de>, <phila@w3.org>, <public-sdw-comments@w3.org>, <public-dwbp-comments@w3.org>
Message-ID: <2A7346E8D9F62D4CA8D78387173A054A603418EB@exmbx04-cdc.nexus.csiro.au>
Yes.

That's why I said subset=(query result), but not (query result)=subset

:-)


Simon J D Cox
Research Scientist
Land and Water<http://www.csiro.au/Organisation-Structure/Flagships/Land-and-Water>
CSIRO
E simon.cox@csiro.au T +61 3 9545 2365 M +61 403 302 672
   Physical: Reception Central, Bayview Avenue, Clayton, Vic 3168
   Deliveries: Gate 3, Normanby Road, Clayton, Vic 3168
   Postal: Private Bag 10, Clayton South, Vic 3169
people.csiro.au/C/S/Simon-Cox<http://people.csiro.au/C/S/Simon-Cox>
orcid.org/0000-0002-3884-3420<http://orcid.org/0000-0002-3884-3420>
researchgate.net/profile/Simon_Cox3<https://www.researchgate.net/profile/Simon_Cox3>
________________________________
From: Peter Baumann [p.baumann@jacobs-university.de]
Sent: Saturday, 2 January 2016 8:17 PM
To: Cox, Simon (L&W, Clayton); phila@w3.org; public-sdw-comments@w3.org; public-dwbp-comments@w3.org
Subject: Re: Subsetting data

looking at queries is a nicely general approach (which I like), it is just that this transcends subsetting:
Subset = set of elements which have been preexisting (ex: vectors from a vector bundle)
Query in addition includes
- fusion = combination of more than one object involved, such as image overlay
- aggregation = delivering scalars, something maybe not in the original object (such as a feature bundle, which is not a scalar) -> type change
- any other type of processing (such as rasterizing vectors, or vectorizing rasters) -> type change

Note that this narrow definition of subset includes an OGC WFS / Filter Encoding right away, whereas the "extended view" does not.

-Peter


On 2016-01-02 01:10, Simon.Cox@csiro.au<mailto:Simon.Cox@csiro.au> wrote:
> to be persistent, identifiers should not include queries against a specific API or query endpoint.

For sure. I didn't say anything about the form of the query. It may not even look like a query. Opensearch is an obvious model for implementation-independent syntax (after all it's just key-value pairs).

However, I do think it is worth keeping the notion of subset=query result in view. Sure, some query results may be more persistent and therefore worthy of denotation with a special identifier. But the same subset will also be the result of some query anyway. That's just an example of non-unique identifiers.

Simon J D Cox

Research Scientist

Environmental Information Infrastructures

Land and Water

CSIRO



E simon.cox@csiro.au<mailto:simon.cox@csiro.au> T +61 3 9545 2365 M +61 403 302 672

   Physical: Reception Central, Bayview Avenue, Clayton, Vic 3168

   Deliveries: Gate 3, Normanby Road, Clayton, Vic 3168

   Postal: Private Bag 10, Clayton South, Vic 3169

people.csiro.au/Simon-Cox

orcid.org/0000-0002-3884-3420

researchgate.net/profile/Simon_Cox3




________________________________
From: Phil Archer
Sent: Friday, 1 January 2016 9:05:25 AM
To: Cox, Simon (L&W, Clayton); public-sdw-comments@w3.org<mailto:public-sdw-comments@w3.org>; public-dwbp-comments@w3.org<mailto:public-dwbp-comments@w3.org>
Subject: Re: Subsetting data



On 30/12/2015 21:26, Simon.Cox@csiro.au<mailto:Simon.Cox@csiro.au> wrote:
> Another way of looking at it is that a query, encoded as a URI pattern, defines an implicit set of potential URIs, each of which denotes a subset.

True, but to be persistent, identifiers should not include queries
against a specific API or query endpoint. That, for me, is the key
point. OpenSearch provides a model where a query is included in a URL
that can be considered persistent because there is a layer of
indirection that could be changed without the URL changing, but a URL
that includes a SQL or SPARQL query directly must be considered
ephemeral IMO.

Phil


>
> Simon J D Cox
> Environmental Informatics
> CSIRO Land and Water
>
> E simon.cox@csiro.au<mailto:simon.cox@csiro.au> T +61 3 9545 2365 M +61 403 302 672
> Physical: Central Reception, Bayview Avenue, Clayton, Vic 3168
> Deliveries: Gate 3, Normanby Road, Clayton, Vic 3168
> Postal: Private Bag 10, Clayton South, Vic 3169
> http://people.csiro.au/Simon-Cox
> http://orcid.org/0000-0002-3884-3420
> http://researchgate.net/profile/Simon_Cox3
>
> ________________________________
> From: Phil Archer
> Sent: Wednesday, 30 December 2015 6:31:16 PM
> To: Manolis Koubarakis; 'public-sdw-comments@w3.org<mailto:public-sdw-comments@w3.org>'; Annette Greiner; Eric Stephan; Tandy, Jeremy; public-dwbp-comments@w3.org<mailto:public-dwbp-comments@w3.org>
> Subject: Subsetting data
>
> At various times in recent months I have promised to look into the topic
> of persistent identifiers for subsets of data. This came up at the SDW
> F2F in Sapporo but has also been raised by Annette in DWBP. In between
> festive activities I've been giving this some thought and have tried to
> begin to commit some ideas to a page [1].
>
> During the CEO-LD meeting, Jeremy pointed to OpenSearch as a possible
> way forward, including its geo-temporal extensions defined by the OGC.
> There is also the Linked Data API as a means of doing this, and what
> they both have in common is that they offer an intermediate layer that
> turns a URL into a query.
>
> How do you define a persistent identifier for a subset of a dataset? IMO
> you mint a URI and say "this identifies a subset of a dataset" - and
> then provide a means of programmatically going from the URI to a query
> that returns the subset. As long as you can replace the intermediate
> layer with another one that also returns the same subset, we're done.
>
> The UK Government Linked Data examples tend to be along the lines of:
>
> http://transport.data.gov.uk/id/stations
> returns a list of all stations in Britain.
>
> http://transport.data.gov.uk/id/stations/Manchester
> returns a list of stations in Manchester
>
> http://transport.data.gov.uk/id/stations/Manchester/Piccadilly
> identifies Manchester Piccadilly station.
>
> All of that data of course comes from a single dataset.
>
> Does this work in the real worlds of meteorology and UBL/PNNL?
>
> Phil.
>
>
>
>
> [1] https://github.com/w3c/sdw/blob/gh-pages/subsetting/index.md
>
>
>
>
> --
>
>
> Phil Archer
> W3C Data Activity Lead
> http://www.w3.org/2013/data/
>
> http://philarcher.org
> +44 (0)7887 767755
> @philarcher1
>
>

--


Phil Archer
W3C Data Activity Lead
http://www.w3.org/2013/data/

http://philarcher.org
+44 (0)7887 767755
@philarcher1


--
Dr. Peter Baumann
 - Professor of Computer Science, Jacobs University Bremen
   www.faculty.jacobs-university.de/pbaumann<http://www.faculty.jacobs-university.de/pbaumann>
   mail: p.baumann@jacobs-university.de<mailto:p.baumann@jacobs-university.de>
   tel: +49-421-200-3178, fax: +49-421-200-493178
 - Executive Director, rasdaman GmbH Bremen (HRB 26793)
   www.rasdaman.com<http://www.rasdaman.com>, mail: baumann@rasdaman.com<mailto:baumann@rasdaman.com>
   tel: 0800-rasdaman, fax: 0800-rasdafax, mobile: +49-173-5837882
"Si forte in alienas manus oberraverit hec peregrina epistola incertis ventis dimissa, sed Deo commendata, precamur ut ei reddatur cui soli destinata, nec preripiat quisquam non sibi parata." (mail disclaimer, AD 1083)
Received on Monday, 4 January 2016 10:45:19 UTC

This archive was generated by hypermail 2.3.1 : Monday, 4 January 2016 10:45:20 UTC