Re: Webby Data from Phil Archer on 2015-10-11 (public-dwbp-comments@w3.org from October 2015)

From: Phil Archer <phila@w3.org>
Date: Sun, 11 Oct 2015 19:01:33 +0100
To: Makx Dekkers <mail@makxdekkers.com>, public-dwbp-comments@w3.org
Cc: 'Erik Wilde' <dret@berkeley.edu>, "'Tandy, Jeremy'" <jeremy.tandy@metoffice.gov.uk>
Message-ID: <561AA3FD.6070505@w3.org>
Switching to public comments list as Jeremy is included in the thread 
and I refer to him later below.

Thanks Makx, pls see inline.

On 11/10/2015 10:47, Makx Dekkers wrote:
> Hi Phil,
>
> Useful thoughts on URIs within datasets.
>
> If I understand correctly, you’re addressing two separate issues:
>
> 1. Identification of real-world entities (people, places, concepts etc.) that are used within the data using well-known URIs, e.g. if some data is related to a particular city, use a well-known URI to refer to the city.

Exactly so, yes.

>
> One thing that we ned to think about is that it is not so easy to find such URIs. It's more complicated than finding terms in ‘predicate vocabularies’ (e.g. using LOV) because there is much more duplication. For example, I would not be able to tell you what is the ‘best’ URI to identify me; would you use the one that I minted and maintain myself, or would it be better to use one that is maintained by a trusted organisation?

In can't answer that either of course. I have a URI for myself which I 
wish were different but I've had it a long time. I can hear Ivan Herman 
et al even now saying 'ORCID' ;-) But for other things, it's certainly hard.

  You refer to 
http://www.w3.org/TR/ld-bp/#how-to-find-existing-vocabularies but that 
seems to be mostly relevant for 'predicate vocabularies', not for 
finding URIs for people etc. In some work I was involved in we tried to 
rank external URIs in dimensions like well maintained, widely used, 
freely available, global vs. regional scope, mono- vs. multilingual 
descriptions -- with decisions driven by intended use and intended audience.

Interesting. Is that available? Could we offer those as metrics do you 
think?

>
> This first issue is are basically concerned about pointing from inside data to the outside world. The second issue is the other way around, pointing from the outside in.
>
> 2. Identification if parts of a dataset. I think that is want you mean by ‘data point’ but maybe that term is not the best, as it seems to imply some numerical value for an observation. I myself would favour a term like ‘part of a dataset’ or ‘data item’.

Of your two terms I prefer data item. Just as you say above, I mean use 
a URI for things like a city. If data point implies a numerical value 
then I mean data item.


>
> On this second issue, we may need to include some warnings. In some cases, a part of a dataset by itself may not be understandable without access to information about the dataset as a whole; e.g. for an observation, you may need to know how and why it was observed; for an article in a law, you may need to know what a particular term means in this specific context.

Yes.

A simple data item:

age: 37

is certainly meaningless without context.

I think you're talking here about the issue of defining a sub set of a 
dataset. So if the dataset is 'all temperature records for Spanish 
cities since they began,' I might just want the subset about Barcelona 
in 2014.

This is an issue that has come to the fore in the Spatial Data on the 
Web WG, in particular, in the context of satellite imagery (the data 
volume of which is enormous). As things stand, the SDW is likely to 
tackle this since geo and temporal restrictions are often exactly the 
kind of subset that's required (my example about Barcelona temperature 
records was not chosen at random).

Jeremy suggested that a likely starting point would be Open Search 
(http://www.opensearch.org/) with the geotemporal extensions defined by 
OGC (http://www.opengeospatial.org/standards/opensearchgeo). I can't 
pretend I've looked at these in any kind of detail but, if we're talking 
about the same thing then most definitely yes, the subset would need to 
include all the contextualising metadata to make sense of the subset.

Would that cover your second issue?

Phil.



  One approach would certainly be to create URIs that are in some way 
derived from the dataset URI, which I understand is the approach of CSVW 
at http://www.w3.org/TR/tabular-metadata/#uri-template-properties. 
However, in the absence of a ‘standard’ way of creating ‘item URIs’ from 
dataset URIs, it may not be possible to know what the dataset URI is 
from looking at the item URI, at least not in a machine-readable way.
>
> So in summary, I think that advice could be given, but I think that there need to be two separate BPs for them.
>
> Makx.
>
>
>

-- 


Phil Archer
W3C Data Activity Lead
http://www.w3.org/2013/data/

http://philarcher.org
+44 (0)7887 767755
@philarcher1
Received on Sunday, 11 October 2015 18:01:37 UTC