- From: Rob Atkinson <rob@metalinkage.com.au>
- Date: Fri, 01 Jan 2016 23:12:36 +0000
- To: Peter Baumann <p.baumann@jacobs-university.de>, Phil Archer <phila@w3.org>, Manolis Koubarakis <koubarak@di.uoa.gr>, "public-sdw-comments@w3.org" <public-sdw-comments@w3.org>, Annette Greiner <amgreiner@lbl.gov>, Eric Stephan <ericphb@gmail.com>, "Tandy, Jeremy" <jeremy.tandy@metoffice.gov.uk>, public-dwbp-comments@w3.org
- Message-ID: <CACfF9LzfGwMosZ1xOd7ZMaAuMq1JxYs8EDg0DUYAyg_VOOy5UQ@mail.gmail.com>
>From my reading of this conversation - as someone on the fringes who has played with a fair bit of the implementation practicalities - but is primarily interested in identifying and promulgating best practices to a wider audience - there is a lot of discussion about the actual meanings of terms - so you may not want to develop a formal ontology - but you will need to define these terms before anyone will make sense of the discussion. in particular - I still feel there is not a fully developed consensus and consistent terminology usage around the distinction between 1) a conceptual query that embodies specific semantics - such as "the latest reported temperature of the air using methdology X at location Y " - obviously such things need identifiers so we can repeat them and attach useful metadata to them 2) The results of such a query (in this case a subset of the air temperature reading record set, starting from T1 and updated every T minutes ) I think there is a consensus that the actual query mechanism should be decoupled by URI dereferencing - and not part of the URI. not sure I can see a consensus regarding the endpoint of the query - if this is part of the dereferencing - so the dereferencing results in a composite entity - which is the enpoint, the actual query used at that endpoint and the result returned - then those things are all properties of the "subset" perhaps? Of course we need to separate the query and the result - because the result may be huge - and we may invoke the query at a different time to when we retrieved it. At this point I could model a system - but I would be struggling to know exactly what terms the WG is using for the different parts of the puzzle - and would want to cite those terms.... hmm... during implementation being able to cite the elements of this information model via a URI would be important. So if the WG isnt able to define such an ontology, but one is needed to implement a system that implements the reasonably complex semantics involved, who would develop such an ontology? How do you stop N ad-hoc ontologies emerging from N implementations of these best practices? For the record, I have played with integrating VoiD, RDF Datacube, Linked Data API and IETF URL templating and been able to handle the dereferencing aspects and having all the metata accessible - but one thing missing is a lightweight ontology to be able to define whether an endpoint returns a subset of a resource, and what type of subset. VoiD supports type-based partitioning as well as overlapping subsets - but this isnt quite powerful enough to handle the sort of use cases here. You could perhaps interpret the existence of a RDF-QB dimension description attached to an endpoint as an implicit statement the endpoint provides subsetting on dimensions - but would that scale to handle cases where subsets are well known and QB is overkill and too high and entry bar? Cheers Rob Atkinson On Sat, 2 Jan 2016 at 01:53 Peter Baumann <p.baumann@jacobs-university.de> wrote: > have added comments and filled placeholders. As I do not have write > permissionsthis has created a fork: > https://github.com/w3c/sdw/compare/gh-pages...pebau:patch-1 > -Peter > > > On 2015-12-30 19:31, Phil Archer wrote: > > At various times in recent months I have promised to look into the topic > of > > persistent identifiers for subsets of data. This came up at the SDW F2F > in > > Sapporo but has also been raised by Annette in DWBP. In between festive > > activities I've been giving this some thought and have tried to begin to > > commit some ideas to a page [1]. > > > > During the CEO-LD meeting, Jeremy pointed to OpenSearch as a possible way > > forward, including its geo-temporal extensions defined by the OGC. There > is > > also the Linked Data API as a means of doing this, and what they both > have in > > common is that they offer an intermediate layer that turns a URL into a > query. > > > > How do you define a persistent identifier for a subset of a dataset? IMO > you > > mint a URI and say "this identifies a subset of a dataset" - and then > provide > > a means of programmatically going from the URI to a query that returns > the > > subset. As long as you can replace the intermediate layer with another > one > > that also returns the same subset, we're done. > > > > The UK Government Linked Data examples tend to be along the lines of: > > > > http://transport.data.gov.uk/id/stations > > returns a list of all stations in Britain. > > > > http://transport.data.gov.uk/id/stations/Manchester > > returns a list of stations in Manchester > > > > http://transport.data.gov.uk/id/stations/Manchester/Piccadilly > > identifies Manchester Piccadilly station. > > > > All of that data of course comes from a single dataset. > > > > Does this work in the real worlds of meteorology and UBL/PNNL? > > > > Phil. > > > > > > > > > > [1] https://github.com/w3c/sdw/blob/gh-pages/subsetting/index.md > > > > > > > > > > -- > Dr. Peter Baumann > - Professor of Computer Science, Jacobs University Bremen > www.faculty.jacobs-university.de/pbaumann > mail: p.baumann@jacobs-university.de > tel: +49-421-200-3178, fax: +49-421-200-493178 > - Executive Director, rasdaman GmbH Bremen (HRB 26793) > www.rasdaman.com, mail: baumann@rasdaman.com > tel: 0800-rasdaman, fax: 0800-rasdafax, mobile: +49-173-5837882 > "Si forte in alienas manus oberraverit hec peregrina epistola incertis > ventis dimissa, sed Deo commendata, precamur ut ei reddatur cui soli > destinata, nec preripiat quisquam non sibi parata." (mail disclaimer, AD > 1083) > > > >
Received on Friday, 1 January 2016 23:13:28 UTC