- From: Phil Archer <phila@w3.org>
- Date: Fri, 1 Jan 2016 09:33:25 +0000
- To: Frans Knibbe <frans.knibbe@geodan.nl>
- Cc: Manolis Koubarakis <koubarak@di.uoa.gr>, "public-sdw-comments@w3.org" <public-sdw-comments@w3.org>, Annette Greiner <amgreiner@lbl.gov>, Eric Stephan <ericphb@gmail.com>, "Tandy, Jeremy" <jeremy.tandy@metoffice.gov.uk>, public-dwbp-comments@w3.org
On 31/12/2015 10:54, Frans Knibbe wrote: > Phil, > > Thank you for bringing up an interesting subject at a time where not much > seems to be going on. > > I think a key question is: Which data should be returned when a dataset URI > is dereferenced? > > And I think the answer should be: at least the metadata describing the > dataset or the subset, and optionally the actual data. I'd say: If I ask for the current temperature in Amsterdam, that's what I want. Good practice would be to include metadata, or links to it (dcterms:isPartOf <allTemperaturesInNL>). I don't disagree that those things are important, metadata clearly is - and goodness knows I like links :-) It's a scoping/capacity to deliver question. Phil > > When discussing datasets and subsets it is good to look at the Vocabulary > of Interlinked Datasets (VoID) <http://www.w3.org/TR/void/>, although its > scope could be too narrow because it is intended to be used for RDF data. > It can be used to make clear that a chunk of data describes a dataset ( > void:Dataset <http://rdfs.org/ns/void#Dataset>) and has subsets (void:subset > <http://www.w3.org/TR/void/#subset>). The Data Catalog Vocabulary > <http://www.w3.org/TR/vocab-dcat/> has a broader scope (it can be used for > any dataset) and has its own definition of a dataset (dcat:Dataset > <http://www.w3.org/ns/dcat#Dataset>). DCAT does not seem to have a way of > identifying subsets, but I guess dcterms:hasPart > <http://purl.org/dc/terms/hasPart> and dcterms:isPartOf > <http://purl.org/dc/terms/isPartOf> can be used to express parent-child > relationships between data collections (dataset mereology). > > So let's assume it is possible to indicate that a set of data describe a > dataset and that it is possible to express in a general way that the > dataset is a subset of a parent dataset and itself is the parent of a > collection of subsets. The data that are returned when a dataset URI is > dereferenced could then include: > > > - A link to the parent dataset (if there is one) > - Links to child datasets (if they exist) > - Descriptions of how to get the actual data (if there are not included > in the response), for example the URI of a SPARQL endpoint or the URIs of > other standard web APIs > - Other general metadata, like spatial extent, temporal extent, human > readable labels, subject(s), etc. > - The actual data that from the dataset > > A recommendation or good practice could be to include the actual data OR > point to subsets. That way there is never a dead end when links are > followed. A data provider could decide the best level of a subset returning > actual data, for example when the amount of data is manageable. > > What I particularly like about this approach is that if the data server > supports HTML (or another format that is supported by web crawlers), we > will have satisfied the crawlability requirement > <http://www.w3.org/TR/sdw-ucr/#Crawlability> and the discoverability > requirement <http://www.w3.org/TR/sdw-ucr/#Discoverability>. A web crawler > could use any dataset URI as a starting point and by recursively visiting > all links always have access to the complete dataset, in a way that does > not require any fancy querying. I hope the search engine people (Ed, > Charles) can confirm this... > > Another thing I like about this approach is that the spatial properties of > a dataset can be helpful in partioning a dataset into managable subsets. An > obvious method would be to use administrative (mereological) relationship: > A European dataset has a subsets for each country, a country dataset has > subsets for each province, and so on. If that possibility is absent it > should always be possible to use a tiling mechanism to partition the > dataset into subsets. I like to think of this as a nice example of how > geospatial practice can be benificial to the Web as a whole. > > By the way, I would like to look at the transport.data.gov.uk examples, but > I get 404s. > > Regards, > Frans > > > > > > > > > > > > 2015-12-30 19:31 GMT+01:00 Phil Archer <phila@w3.org>: > >> At various times in recent months I have promised to look into the topic >> of persistent identifiers for subsets of data. This came up at the SDW F2F >> in Sapporo but has also been raised by Annette in DWBP. In between festive >> activities I've been giving this some thought and have tried to begin to >> commit some ideas to a page [1]. >> >> During the CEO-LD meeting, Jeremy pointed to OpenSearch as a possible way >> forward, including its geo-temporal extensions defined by the OGC. There is >> also the Linked Data API as a means of doing this, and what they both have >> in common is that they offer an intermediate layer that turns a URL into a >> query. >> >> How do you define a persistent identifier for a subset of a dataset? IMO >> you mint a URI and say "this identifies a subset of a dataset" - and then >> provide a means of programmatically going from the URI to a query that >> returns the subset. As long as you can replace the intermediate layer with >> another one that also returns the same subset, we're done. >> >> The UK Government Linked Data examples tend to be along the lines of: >> >> http://transport.data.gov.uk/id/stations >> returns a list of all stations in Britain. >> >> http://transport.data.gov.uk/id/stations/Manchester >> returns a list of stations in Manchester >> >> http://transport.data.gov.uk/id/stations/Manchester/Piccadilly >> identifies Manchester Piccadilly station. >> >> All of that data of course comes from a single dataset. >> >> Does this work in the real worlds of meteorology and UBL/PNNL? >> >> Phil. >> >> >> >> >> [1] https://github.com/w3c/sdw/blob/gh-pages/subsetting/index.md >> >> >> >> >> -- >> >> >> Phil Archer >> W3C Data Activity Lead >> http://www.w3.org/2013/data/ >> >> http://philarcher.org >> +44 (0)7887 767755 >> @philarcher1 >> >> > -- Phil Archer W3C Data Activity Lead http://www.w3.org/2013/data/ http://philarcher.org +44 (0)7887 767755 @philarcher1
Received on Friday, 1 January 2016 09:32:51 UTC