RE: Webby Data from Makx Dekkers on 2015-10-11 (public-dwbp-comments@w3.org from October 2015)

From: Makx Dekkers <mail@makxdekkers.com>
Date: Sun, 11 Oct 2015 21:37:21 +0200
To: "'Phil Archer'" <phila@w3.org>, <public-dwbp-comments@w3.org>
Cc: "'Erik Wilde'" <dret@berkeley.edu>, "'Tandy, Jeremy'" <jeremy.tandy@metoffice.gov.uk>
Message-ID: <000001d1045c$43327e60$c9977b20$@makxdekkers.com>

Thanks Phil,

[..snip..]

> for people etc. In some work I was involved in we tried to rank external URIs
> in dimensions like well maintained, widely used, freely available, global vs.
> regional scope, mono- vs. multilingual descriptions -- with decisions driven by
> intended use and intended audience.
> 
> Interesting. Is that available? Could we offer those as metrics do you think?
> 

This is part of the DCAT-AP specification linked from https://joinup.ec.europa.eu/asset/dcat_application_profile/asset_release/dcat-application-profile-data-portals-europe-final. A revised version of the profile will be published shortly and still contains this list. 

It says:

Controlled vocabularies SHOULD:

• Be published under an open licence.
• Be operated and/or maintained by an institution of the European Union, by a recognised standards organisation or another trusted organisation.
• Be properly documented.
• Have labels in multiple languages, ideally in all official languages of the European Union.
• Contain a relatively small number of terms (e.g. 10-25) that are general enough to enable a wide range of resources to be classified.
• Have terms that are identified by URIs with each URI resolving to documentation about the term.
• Have associated persistence and versioning policies. 

The second bullet puts more emphasis on maintenance on the European level because that is the scope of the DCAT-AP. The fifth bullet (the small number of terms) is mostly relevant for classifications, not of course for URI collections identifying things like countries and languages.

But I wouldn't call those criteria 'metrics'. They are meant to be things that you may want to consider.

[...snip...]
> 
> I think you're talking here about the issue of defining a sub set of a dataset.
> So if the dataset is 'all temperature records for Spanish cities since they
> began,' I might just want the subset about Barcelona in 2014.
> 

Yes, this is about subsets, but could also apply to a single data item -- I am thinking about the example of an article in a law.

[...snip...]

> pretend I've looked at these in any kind of detail but, if we're talking about
> the same thing then most definitely yes, the subset would need to include all
> the contextualising metadata to make sense of the subset.
> 
> Would that cover your second issue?

Yes, definitely, but we also need to look at reality. Sometimes organisations already have difficulties to create 'good' metadata descriptions of their datasets because of lack of resources; while I strongly agree that is good practice to provide metadata for every subset that is identified by a URI, we may not see very much of that in practice.

A practical option that could be mentioned is an approach by which subset URIs and item URIs are minted based on the dataset URI -- like CSVW and the European Legislation Identifier are doing -- so that at least a human user can try stripping off parts at the end of the URI to get to the (metadata) of the full dataset.

Makx.

Received on Sunday, 11 October 2015 19:38:05 UTC