W3C home > Mailing lists > Public > public-dwbp-comments@w3.org > January 2016

Re: Subsetting data

From: Peter Baumann <p.baumann@jacobs-university.de>
Date: Fri, 1 Jan 2016 14:44:04 +0100
To: Phil Archer <phila@w3.org>, Rob Atkinson <rob@metalinkage.com.au>, <Simon.Cox@csiro.au>, <koubarak@di.uoa.gr>, <public-sdw-comments@w3.org>, <amgreiner@lbl.gov>, <ericphb@gmail.com>, <jeremy.tandy@metoffice.gov.uk>, <public-dwbp-comments@w3.org>
Message-ID: <568682A4.6000308@jacobs-university.de>

On 2016-01-01 10:17, Phil Archer wrote:
> On 31/12/2015 02:07, Rob Atkinson wrote:
>> I'm not a strong set-theoretician - but it strikes me there are some
>> tensions here:
>> Does the identifier of a set mean that the members of that set are
>> constant, known in advance and always retrievable?
> Nope. You can just as easily have a stable ID for a 'latest version' or
> 'current value'. There should also be an ID for an immutable version. See
> http://www.w3.org/TR/dwbp/#VersionIdentifiers.
>    Is a query endpoint a
>> resource (does either URI or URL have meaning against a query that delivers
>> real time data - including the use case of "at this point in time we think
>> these things are members of this set?" )
>> If the subset is the result of a query - and you care that it is the same
>> subset another time you look at it - are you actually assigning an
>> identifier to the artefact - which is the query response, whose properties
>> include the original query, where it was made, and the time it was made?
> For persistence, no specific technology or query should be baked into the URL.
> My example of
> http://weather.example.com/temperature/UK/London/noon/today
> can persist well beyond the lifetime of whatever technology might be installed
> in 2016 as the URL would be interpreted by the software of the day.

actually, this implies a very specific data model: strict hierarchies - in the
end, equivalents of directory hierarchies. This is ftp age ;-)

>> Can you define an ontology for terms like subset, query, response that you
>> all agree on?
> That would be going beyond the current aim as I see it.

hm, how can you do subsetting without having a definition of what it means?


>> I share Phil's implicit concern that subsetting by type with URI patterns
>> may not be universally applicable - IMHO that equates to a "sub-register"
>> pattern, where a set has its members defined by some identifiable process
>> (indepent of any query functions available) - which may include explicit
>> subsets - for example by object type, or delegated registration processes.
>> That probably fits the UK implementation better than a query-defined
>> subset.
>> If subsets have some prior meaning - and a query is used to access then
>> from a service endpint - then the query is a URL that needs to be bound to
>> the object URI. AFAICT thats a very different thing to saying an arbitrary
>> query result defines a subset of data.
>> I think you may, in general, assign an ID to the artefact which is the
>> result of a query at a given time, and if you want to make that into
>> something with more semantics then you need make it into a new type of
>> object which can be described in terms of what it means. I think currently
>> the conversation is conflating these two perspectives of "subset".
> I don't think we're talking about defining any new semantics. URIs are dumb
> strings (if we break that, we break the Web). It's probably good practice to
> include links and relations as others suggest in this thread, so the
> dcterms:isPartOf predicate is clearly useful. The only new semantics I can see
> we may want to think about some sort of property that can link a persistent
> URI to an ephemeral query that is the current way to return whatever the ID is
> for - but I'd be happy not to do this unless either of the two groups
> discussing this think it necessary.
> Phil.
>> Cheers, and farewell to 2015.
>> Rob Atkinson.
>> On Thu, 31 Dec 2015 at 08:26 <Simon.Cox@csiro.au> wrote:
>>> Another way of looking at it is that a query, encoded as a URI pattern,
>>> defines an implicit set of potential URIs, each of which denotes a subset.
>>> Simon J D Cox
>>> Environmental Informatics
>>> CSIRO Land and Water
>>> E simon.cox@csiro.au T +61 3 9545 2365 M +61 403 302 672
>>> Physical: Central Reception, Bayview Avenue, Clayton, Vic 3168
>>> Deliveries: Gate 3, Normanby Road, Clayton, Vic 3168
>>> Postal: Private Bag 10, Clayton South, Vic 3169
>>> http://people.csiro.au/Simon-Cox
>>> http://orcid.org/0000-0002-3884-3420
>>> http://researchgate.net/profile/Simon_Cox3
>>> ------------------------------
>>> *From:* Phil Archer
>>> *Sent:* Wednesday, 30 December 2015 6:31:16 PM
>>> *To:* Manolis Koubarakis; 'public-sdw-comments@w3.org'; Annette Greiner;
>>> Eric Stephan; Tandy, Jeremy; public-dwbp-comments@w3.org
>>> *Subject:* Subsetting data
>>> At various times in recent months I have promised to look into the topic
>>> of persistent identifiers for subsets of data. This came up at the SDW
>>> F2F in Sapporo but has also been raised by Annette in DWBP. In between
>>> festive activities I've been giving this some thought and have tried to
>>> begin to commit some ideas to a page [1].
>>> During the CEO-LD meeting, Jeremy pointed to OpenSearch as a possible
>>> way forward, including its geo-temporal extensions defined by the OGC.
>>> There is also the Linked Data API as a means of doing this, and what
>>> they both have in common is that they offer an intermediate layer that
>>> turns a URL into a query.
>>> How do you define a persistent identifier for a subset of a dataset? IMO
>>> you mint a URI and say "this identifies a subset of a dataset" - and
>>> then provide a means of programmatically going from the URI to a query
>>> that returns the subset. As long as you can replace the intermediate
>>> layer with another one that also returns the same subset, we're done.
>>> The UK Government Linked Data examples tend to be along the lines of:
>>> http://transport.data.gov.uk/id/stations
>>> returns a list of all stations in Britain.
>>> http://transport.data.gov.uk/id/stations/Manchester
>>> returns a list of stations in Manchester
>>> http://transport.data.gov.uk/id/stations/Manchester/Piccadilly
>>> identifies Manchester Piccadilly station.
>>> All of that data of course comes from a single dataset.
>>> Does this work in the real worlds of meteorology and UBL/PNNL?
>>> Phil.
>>> [1] https://github.com/w3c/sdw/blob/gh-pages/subsetting/index.md
>>> -- 
>>> Phil Archer
>>> W3C Data Activity Lead
>>> http://www.w3.org/2013/data/
>>> http://philarcher.org
>>> +44 (0)7887 767755
>>> @philarcher1

Dr. Peter Baumann
 - Professor of Computer Science, Jacobs University Bremen
   mail: p.baumann@jacobs-university.de
   tel: +49-421-200-3178, fax: +49-421-200-493178
 - Executive Director, rasdaman GmbH Bremen (HRB 26793)
   www.rasdaman.com, mail: baumann@rasdaman.com
   tel: 0800-rasdaman, fax: 0800-rasdafax, mobile: +49-173-5837882
"Si forte in alienas manus oberraverit hec peregrina epistola incertis ventis dimissa, sed Deo commendata, precamur ut ei reddatur cui soli destinata, nec preripiat quisquam non sibi parata." (mail disclaimer, AD 1083)
Received on Friday, 1 January 2016 13:44:43 UTC

This archive was generated by hypermail 2.3.1 : Friday, 1 January 2016 13:44:43 UTC