W3C home > Mailing lists > Public > public-dwbp-comments@w3.org > December 2015

Re: Subsetting data

From: Peter Baumann <p.baumann@jacobs-university.de>
Date: Thu, 31 Dec 2015 15:08:01 +0100
To: Makx Dekkers <mail@makxdekkers.com>, 'Dan Brickley' <danbri@google.com>, 'Clemens Portele' <portele@interactive-instruments.de>, 'Rob Atkinson' <rob@metalinkage.com.au>
CC: 'Phil Archer' <phila@w3.org>, 'Simon Cox' <Simon.Cox@csiro.au>, <amgreiner@lbl.gov>, <ericphb@gmail.com>, <jeremy.tandy@metoffice.gov.uk>, <koubarak@di.uoa.gr>, <public-dwbp-comments@w3.org>, <public-sdw-comments@w3.org>
Message-ID: <568536C1.8000108@jacobs-university.de>
good to carve out this aspect. Actually, this is independent from geospatial
data. Microcitations into papers, eg, are a similar thing.
Another thought: URIs identify, but do not explain - identifying by query means
that the history of deriving the "subset" is stamped into the URI. From the
concerns expressed in the thread something stable (ie, leading to reproducible
results) might be used as an identifier, hence: Why not assign some OID / DOI /
whatever to a query result (i) once this is generated, (ii) upon explicit user
request.
-Peter


On 2015-12-31 14:35, Makx Dekkers wrote:
>
> Interesting discussion.
>
>  
>
> It seems to me that there are (at least) two different types of subset which I
> think is what Rob addressed:
>
>  
>
> 1.       Subsets that are in some way stable – for example a worksheet in a
> spreadsheet workbook has identity, a name and other more or less fixed
> characteristics.
>
> 2.       Subsets that are ephemeral – as in the case of query results
>
>  
>
> For the first type, URIs make sense; a subset of a dataset could be seen as
> just another dataset that is related with a isPartOf relationship to the
> bigger dataset.
>
> By the way, I think that URI patterns may help a publisher to generate the URI
> but it won’t help the user to understand how it relates to a bigger entity –
> you cannot expect standard behaviour across publishers.
>
>  
>
> For the second type of subset, I agree with Dan that a URI for something that
> is not stable is of questionable value. In that case, wouldn’t the URI
> identify the query rather than its results?
>
>  
>
> Makx.
>
>  
>
>  
>
>  
>
> *From:*Dan Brickley [mailto:danbri@google.com]
> *Sent:* 31 December 2015 12:09
> *To:* Clemens Portele <portele@interactive-instruments.de>; Rob Atkinson
> <rob@metalinkage.com.au>
> *Cc:* Phil Archer <phila@w3.org>; Simon Cox <Simon.Cox@csiro.au>;
> amgreiner@lbl.gov; ericphb@gmail.com; jeremy.tandy@metoffice.gov.uk;
> koubarak@di.uoa.gr; public-dwbp-comments@w3.org; public-sdw-comments@w3.org
> *Subject:* Re: Subsetting data
>
>  
>
>  
>
> Isn't a "subset" just a query result, or which there are effectively an
> unlimited number?
>
>  
>
> Storing a query so it can be re-run against evolving data has value. Having a
> URI for that, perhaps less so.
>
>  
>
> Dan
>
> On Thu, 31 Dec 2015, 08:14 Clemens Portele <portele@interactive-instruments.de
> <mailto:portele@interactive-instruments.de>> wrote:
>
>     Rob, 
>
>      
>
>     what you describe seems to apply to the dataset (resource) the same way it
>     would apply to any subset resource. I.e. are you discussing a more general
>     question, not the subsetting question?
>
>      
>
>     Phil,
>
>      
>
>     a (probably often unproblematic) restriction to the temperature/uk/london
>     or stations/manchester approach is that there is only one path, so you end
>     up with limitations on the subsets. If you want to support multiple
>     subsets, e.g. also stations where high speed trains stop, stations that
>     have a ticket shop, etc. then there are several issues with a
>     /{dataset}/{subset}/…/{subset}/{object} approach. These include an unclear
>     URI scheme ("manchester" and "eurostar" would be on the same path level),
>     potential name collisions of subset names of different subsetting
>     categories, and multiple URIs for the same feature/object.
>
>      
>
>     Best regards,
>
>     Clemens
>
>      
>
>      
>
>         On 31 Dec 2015, at 03:07, Rob Atkinson <rob@metalinkage.com.au
>         <mailto:rob@metalinkage.com.au>> wrote:
>
>          
>
>         I'm not a strong set-theoretician - but it strikes me there are some
>         tensions here:
>
>          
>
>         Does the identifier of a set mean that the members of that set are
>         constant, known in advance and always retrievable?   Is a query
>         endpoint a resource (does either URI or URL have meaning against a
>         query that delivers real time data - including the use case of "at
>         this point in time we think these things are members of this set?" )
>
>          
>
>         If the subset is the result of a query - and you care that it is the
>         same subset another time you look at it - are you actually assigning
>         an identifier to the artefact - which is the query response, whose
>         properties include the original query, where it was made, and the time
>         it was made?
>
>          
>
>         Can you define an ontology for terms like subset, query, response that
>         you all agree on?
>
>          
>
>         I share Phil's implicit concern that subsetting by type with URI
>         patterns may not be universally applicable - IMHO that equates to a
>         "sub-register" pattern, where a set has its members defined by some
>         identifiable process (indepent of any query functions available) -
>         which may include explicit subsets - for example by object type, or
>         delegated registration processes. That probably fits the UK
>         implementation better than a query-defined subset. 
>
>          
>
>         If subsets have some prior meaning - and a query is used to access
>         then from a service endpint - then the query is a URL that needs to be
>         bound to the object URI. AFAICT thats a very different thing to saying
>         an arbitrary query result defines a subset of data. 
>
>          
>
>         I think you may, in general, assign an ID to the artefact which is the
>         result of a query at a given time, and if you want to make that into
>         something with more semantics then you need make it into a new type of
>         object which can be described in terms of what it means. I think
>         currently the conversation is conflating these two perspectives of
>         "subset".
>
>          
>
>         Cheers, and farewell to 2015.
>
>         Rob Atkinson.
>
>          
>
>          
>
>          
>
>          
>
>         On Thu, 31 Dec 2015 at 08:26 <Simon.Cox@csiro.au
>         <mailto:Simon.Cox@csiro.au>> wrote:
>
>             Another way of looking at it is that a query, encoded as a URI
>             pattern, defines an implicit set of potential URIs, each of which
>             denotes a subset.
>
>             Simon J D Cox
>             Environmental Informatics
>             CSIRO Land and Water
>
>             E simon.cox@csiro.au <mailto:simon.cox@csiro.au> T +61 3 9545 2365
>             M +61 403 302 672
>             Physical: Central Reception, Bayview Avenue, Clayton, Vic 3168
>             Deliveries: Gate 3, Normanby Road, Clayton, Vic 3168
>             Postal: Private Bag 10, Clayton South, Vic 3169
>             http://people.csiro.au/Simon-Cox
>             http://orcid.org/0000-0002-3884-3420
>             http://researchgate.net/profile/Simon_Cox3**
>
>             * *
>
>             --------------------------------------------------------------------------------
>
>             *From:*Phil Archer
>             *Sent:* Wednesday, 30 December 2015 6:31:16 PM
>             *To:* Manolis Koubarakis; 'public-sdw-comments@w3.org
>             <mailto:public-sdw-comments@w3.org>'; Annette Greiner; Eric
>             Stephan; Tandy, Jeremy; public-dwbp-comments@w3.org
>             <mailto:public-dwbp-comments@w3.org>
>             *Subject:* Subsetting data
>
>             At various times in recent months I have promised to look into the
>             topic
>             of persistent identifiers for subsets of data. This came up at the
>             SDW
>             F2F in Sapporo but has also been raised by Annette in DWBP. In
>             between
>             festive activities I've been giving this some thought and have
>             tried to
>             begin to commit some ideas to a page [1].
>
>             During the CEO-LD meeting, Jeremy pointed to OpenSearch as a possible
>             way forward, including its geo-temporal extensions defined by the
>             OGC.
>             There is also the Linked Data API as a means of doing this, and what
>             they both have in common is that they offer an intermediate layer
>             that
>             turns a URL into a query.
>
>             How do you define a persistent identifier for a subset of a
>             dataset? IMO
>             you mint a URI and say "this identifies a subset of a dataset" - and
>             then provide a means of programmatically going from the URI to a
>             query
>             that returns the subset. As long as you can replace the intermediate
>             layer with another one that also returns the same subset, we're done.
>
>             The UK Government Linked Data examples tend to be along the lines of:
>
>             http://transport.data.gov.uk/id/stations
>             returns a list of all stations in Britain.
>
>             http://transport.data.gov.uk/id/stations/Manchester
>             returns a list of stations in Manchester
>
>             http://transport.data.gov.uk/id/stations/Manchester/Piccadilly
>             identifies Manchester Piccadilly station.
>
>             All of that data of course comes from a single dataset.
>
>             Does this work in the real worlds of meteorology and UBL/PNNL?
>
>             Phil.
>
>
>
>
>             [1] https://github.com/w3c/sdw/blob/gh-pages/subsetting/index.md
>
>
>
>
>             -- 
>
>
>             Phil Archer
>             W3C Data Activity Lead
>             http://www.w3.org/2013/data/
>
>             http://philarcher.org <http://philarcher.org/>
>             +44 (0)7887 767755
>             @philarcher1
>
>      
>

-- 
Dr. Peter Baumann
 - Professor of Computer Science, Jacobs University Bremen
   www.faculty.jacobs-university.de/pbaumann
   mail: p.baumann@jacobs-university.de
   tel: +49-421-200-3178, fax: +49-421-200-493178
 - Executive Director, rasdaman GmbH Bremen (HRB 26793)
   www.rasdaman.com, mail: baumann@rasdaman.com
   tel: 0800-rasdaman, fax: 0800-rasdafax, mobile: +49-173-5837882
"Si forte in alienas manus oberraverit hec peregrina epistola incertis ventis dimissa, sed Deo commendata, precamur ut ei reddatur cui soli destinata, nec preripiat quisquam non sibi parata." (mail disclaimer, AD 1083)
Received on Thursday, 31 December 2015 14:08:44 UTC

This archive was generated by hypermail 2.3.1 : Thursday, 31 December 2015 14:08:45 UTC