RE: Recommendations: specificity from Andy Powell on 2011-03-30 (public-lld@w3.org from March 2011)

From: Andy Powell <andy.powell@eduserv.org.uk>
Date: Wed, 30 Mar 2011 11:46:49 +0100
To: Richard Light <richard@light.demon.co.uk>, Karen Coyle <kcoyle@kcoyle.net>
CC: public-lld <public-lld@w3.org>
Message-ID: <051279DC42F84849A64DA04141E1F955708B199713@edu-vmw-eml-l01.edu2000.com>
> We need to persuade publishers of vocabularies in our sector that the advent of LD brings
> with it a responsibility to re-publish in LD format, so that their users can get LD value from
> all the investment those users have made in using that vocabulary. This doesn't need to
> mean that the publisher loses all control over their investment.

Somewhat oddly, I suspect that one could make an argument that minting URIs and publishing Linked Data is actually a way of protecting the publishers' investment (because, otherwise, publishers will see their vocabularies begin to surface at multiple other places on the web, some of which will likely gain some traction).

Not sure how easy it will be to persuade people of this though ;-)

Andy

--
Andy Powell
Research Programme Director
Eduserv
t: 01225 474319
m: 07989 476710
twitter: @andypowe11
blog: efoundations.typepad.com

www.eduserv.org.uk 


-----Original Message-----
From: public-lld-request@w3.org [mailto:public-lld-request@w3.org] On Behalf Of Richard Light
Sent: 30 March 2011 11:21
To: Karen Coyle
Cc: public-lld
Subject: Re: Recommendations: specificity

In message <20110329100006.202654whce01ua6u@kcoyle.net>, Karen Coyle <kcoyle@kcoyle.net> writes
>
>LCSH is done, but Dewey is only available on a limited basis because 
>there are contractual constraints. Aside from that, though, one of the 
>issues that I see here is that many of these vocabularies are "owned"
>by single institutions and therefore we are dependent on those 
>institutions to issue them in RDF. Out of some frustration about this, 
>both Ross Singer and I have independently done some work on MARC 
>vocabularies. And look at what has happened with FRBR, which was not 
>provided by its "creator" body until many years after others had done 
>so. This is not a rant against those institutions but a real problem 
>that we need to deal with. Can we find a way to "communalize" more of 
>these vocabularies so that they can be converted in a more agile manor?

I agree it's a hard problem.  In my sector, the Getty vocabularies (AAT, ULAN, TGN) are another case in point.  You have an economic model where users have been willing to pay to use a curated vocabulary in their data, and the publishers of that vocabulary protected their investment in it by limiting access.  All perfectly reasonable.

Now these users want to publish their data as LD. Either they publish data from these vocabularies as strings (losing any LD benefit) or invent their own URL pattern (thereby creating a mini-silo).

We need to persuade publishers of vocabularies in our sector that the advent of LD brings with it a responsibility to re-publish in LD format, so that their users can get LD value from all the investment those users have made in using that vocabulary.

This doesn't need to mean that the publisher loses all control over their investment.  If each term/concept in a vocabulary is published as a separate "slash URL", it is unlikely that the whole vocabulary would be pirated from its LD representation.  Also, the RDF which is published doesn't need to contain every detail which is offered to paying
customers: the key requirement is to have a published URL for each concept.

There is a related issue to consider once vocabularies are available as LD, which is whether to use URLs which reflect the term in the vocabulary, or to go base them on the term's identifier within the vocabulary, i.e.:

http://mygetty.org/aat/300011666
or
http://mygetty.org/aat/Alberene_stone

I have found that there is a strong instinctive preference for "human-friendly" URLs, and of course this is what will typically be in users' data. However, it is arguable that they will actually be better served by the "meaningless" identifier (so long as it can easily be dereferenced, and the human-friendly info retrieved as required). 
Geonames is a good example where we manage to get along without "meaningful" URLs.  Another point for the report, maybe?

Richard
--
Richard Light
Received on Wednesday, 30 March 2011 10:47:33 UTC