RE: Webby Data from Makx Dekkers on 2015-10-11 (public-dwbp-wg@w3.org from October 2015)

From: Makx Dekkers <mail@makxdekkers.com>
Date: Sun, 11 Oct 2015 11:47:31 +0200
To: "'Phil Archer'" <phila@w3.org>, "'Public DWBP WG'" <public-dwbp-wg@w3.org>
Cc: "'Erik Wilde'" <dret@berkeley.edu>, "'Tandy, Jeremy'" <jeremy.tandy@metoffice.gov.uk>
Message-ID: <000001d10409$da6cb870$8f462950$@makxdekkers.com>

Hi Phil,

Useful thoughts on URIs within datasets.

If I understand correctly, you’re addressing two separate issues:

1. Identification of real-world entities (people, places, concepts etc.) that are used within the data using well-known URIs, e.g. if some data is related to a particular city, use a well-known URI to refer to the city.

One thing that we ned to think about is that it is not so easy to find such URIs. It's more complicated than finding terms in ‘predicate vocabularies’ (e.g. using LOV) because there is much more duplication. For example, I would not be able to tell you what is the ‘best’ URI to identify me; would you use the one that I minted and maintain myself, or would it be better to use one that is maintained by a trusted organisation? You refer to http://www.w3.org/TR/ld-bp/#how-to-find-existing-vocabularies but that seems to be mostly relevant for 'predicate vocabularies', not for finding URIs for people etc. In some work I was involved in we tried to rank external URIs in dimensions like well maintained, widely used, freely available, global vs. regional scope, mono- vs. multilingual descriptions -- with decisions driven by intended use and intended audience.

This first issue is are basically concerned about pointing from inside data to the outside world. The second issue is the other way around, pointing from the outside in.

2. Identification if parts of a dataset. I think that is want you mean by ‘data point’ but maybe that term is not the best, as it seems to imply some numerical value for an observation. I myself would favour a term like ‘part of a dataset’ or ‘data item’.

On this second issue, we may need to include some warnings. In some cases, a part of a dataset by itself may not be understandable without access to information about the dataset as a whole; e.g. for an observation, you may need to know how and why it was observed; for an article in a law, you may need to know what a particular term means in this specific context. One approach would certainly be to create URIs that are in some way derived from the dataset URI, which I understand is the approach of CSVW at http://www.w3.org/TR/tabular-metadata/#uri-template-properties. However, in the absence of a ‘standard’ way of creating ‘item URIs’ from dataset URIs, it may not be possible to know what the dataset URI is from looking at the item URI, at least not in a machine-readable way.

So in summary, I think that advice could be given, but I think that there need to be two separate BPs for them.

Makx.

Received on Sunday, 11 October 2015 09:48:06 UTC