Re: Recommendations: specificity from Richard Light on 2011-03-30 (public-lld@w3.org from March 2011)

From: Richard Light <richard@light.demon.co.uk>
Date: Wed, 30 Mar 2011 11:21:16 +0100
To: Karen Coyle <kcoyle@kcoyle.net>
Cc: public-lld <public-lld@w3.org>
Message-ID: <brQkvFBcQwkNFwvy@light.demon.co.uk>

In message <20110329100006.202654whce01ua6u@kcoyle.net>, Karen Coyle 
<kcoyle@kcoyle.net> writes
>
>LCSH is done, but Dewey is only available on a limited basis because 
>there are contractual constraints. Aside from that, though, one of the 
>issues that I see here is that many of these vocabularies are "owned" 
>by single institutions and therefore we are dependent on those 
>institutions to issue them in RDF. Out of some frustration about this, 
>both Ross Singer and I have independently done some work on MARC 
>vocabularies. And look at what has happened with FRBR, which was not 
>provided by its "creator" body until many years after others had done 
>so. This is not a rant against those institutions but a real problem 
>that we need to deal with. Can we find a way to "communalize" more of 
>these vocabularies so that they can be converted in a more agile manor?

I agree it's a hard problem.  In my sector, the Getty vocabularies (AAT, 
ULAN, TGN) are another case in point.  You have an economic model where 
users have been willing to pay to use a curated vocabulary in their 
data, and the publishers of that vocabulary protected their investment 
in it by limiting access.  All perfectly reasonable.

Now these users want to publish their data as LD. Either they publish 
data from these vocabularies as strings (losing any LD benefit) or 
invent their own URL pattern (thereby creating a mini-silo).

We need to persuade publishers of vocabularies in our sector that the 
advent of LD brings with it a responsibility to re-publish in LD format, 
so that their users can get LD value from all the investment those users 
have made in using that vocabulary.

This doesn't need to mean that the publisher loses all control over 
their investment.  If each term/concept in a vocabulary is published as 
a separate "slash URL", it is unlikely that the whole vocabulary would 
be pirated from its LD representation.  Also, the RDF which is published 
doesn't need to contain every detail which is offered to paying 
customers: the key requirement is to have a published URL for each 
concept.

There is a related issue to consider once vocabularies are available as 
LD, which is whether to use URLs which reflect the term in the 
vocabulary, or to go base them on the term's identifier within the 
vocabulary, i.e.:

http://mygetty.org/aat/300011666
or
http://mygetty.org/aat/Alberene_stone

I have found that there is a strong instinctive preference for 
"human-friendly" URLs, and of course this is what will typically be in 
users' data. However, it is arguable that they will actually be better 
served by the "meaningless" identifier (so long as it can easily be 
dereferenced, and the human-friendly info retrieved as required). 
Geonames is a good example where we manage to get along without 
"meaningful" URLs.  Another point for the report, maybe?

Richard
-- 
Richard Light

Received on Wednesday, 30 March 2011 10:23:03 UTC