- From: Phil Archer <phila@w3.org>
- Date: Sun, 11 Oct 2015 19:01:33 +0100
- To: Makx Dekkers <mail@makxdekkers.com>, public-dwbp-comments@w3.org
- Cc: 'Erik Wilde' <dret@berkeley.edu>, "'Tandy, Jeremy'" <jeremy.tandy@metoffice.gov.uk>
Switching to public comments list as Jeremy is included in the thread and I refer to him later below. Thanks Makx, pls see inline. On 11/10/2015 10:47, Makx Dekkers wrote: > Hi Phil, > > Useful thoughts on URIs within datasets. > > If I understand correctly, you’re addressing two separate issues: > > 1. Identification of real-world entities (people, places, concepts etc.) that are used within the data using well-known URIs, e.g. if some data is related to a particular city, use a well-known URI to refer to the city. Exactly so, yes. > > One thing that we ned to think about is that it is not so easy to find such URIs. It's more complicated than finding terms in ‘predicate vocabularies’ (e.g. using LOV) because there is much more duplication. For example, I would not be able to tell you what is the ‘best’ URI to identify me; would you use the one that I minted and maintain myself, or would it be better to use one that is maintained by a trusted organisation? In can't answer that either of course. I have a URI for myself which I wish were different but I've had it a long time. I can hear Ivan Herman et al even now saying 'ORCID' ;-) But for other things, it's certainly hard. You refer to http://www.w3.org/TR/ld-bp/#how-to-find-existing-vocabularies but that seems to be mostly relevant for 'predicate vocabularies', not for finding URIs for people etc. In some work I was involved in we tried to rank external URIs in dimensions like well maintained, widely used, freely available, global vs. regional scope, mono- vs. multilingual descriptions -- with decisions driven by intended use and intended audience. Interesting. Is that available? Could we offer those as metrics do you think? > > This first issue is are basically concerned about pointing from inside data to the outside world. The second issue is the other way around, pointing from the outside in. > > 2. Identification if parts of a dataset. I think that is want you mean by ‘data point’ but maybe that term is not the best, as it seems to imply some numerical value for an observation. I myself would favour a term like ‘part of a dataset’ or ‘data item’. Of your two terms I prefer data item. Just as you say above, I mean use a URI for things like a city. If data point implies a numerical value then I mean data item. > > On this second issue, we may need to include some warnings. In some cases, a part of a dataset by itself may not be understandable without access to information about the dataset as a whole; e.g. for an observation, you may need to know how and why it was observed; for an article in a law, you may need to know what a particular term means in this specific context. Yes. A simple data item: age: 37 is certainly meaningless without context. I think you're talking here about the issue of defining a sub set of a dataset. So if the dataset is 'all temperature records for Spanish cities since they began,' I might just want the subset about Barcelona in 2014. This is an issue that has come to the fore in the Spatial Data on the Web WG, in particular, in the context of satellite imagery (the data volume of which is enormous). As things stand, the SDW is likely to tackle this since geo and temporal restrictions are often exactly the kind of subset that's required (my example about Barcelona temperature records was not chosen at random). Jeremy suggested that a likely starting point would be Open Search (http://www.opensearch.org/) with the geotemporal extensions defined by OGC (http://www.opengeospatial.org/standards/opensearchgeo). I can't pretend I've looked at these in any kind of detail but, if we're talking about the same thing then most definitely yes, the subset would need to include all the contextualising metadata to make sense of the subset. Would that cover your second issue? Phil. One approach would certainly be to create URIs that are in some way derived from the dataset URI, which I understand is the approach of CSVW at http://www.w3.org/TR/tabular-metadata/#uri-template-properties. However, in the absence of a ‘standard’ way of creating ‘item URIs’ from dataset URIs, it may not be possible to know what the dataset URI is from looking at the item URI, at least not in a machine-readable way. > > So in summary, I think that advice could be given, but I think that there need to be two separate BPs for them. > > Makx. > > > -- Phil Archer W3C Data Activity Lead http://www.w3.org/2013/data/ http://philarcher.org +44 (0)7887 767755 @philarcher1
Received on Sunday, 11 October 2015 18:01:37 UTC