Re: Size matters -- How big is the danged thing from Richard Light on 2008-11-20 (public-lod@w3.org from November 2008)

From: Richard Light <richard@light.demon.co.uk>
Date: Thu, 20 Nov 2008 12:27:20 +0000
To: Matthias Samwald <samwald@gmx.at>
Cc: public-lod@w3.org
Message-ID: <9PGyjpJoeVJJFwBt@light.demon.co.uk>

In message <4B27C45E67E94BA59C9550B7E7281D0E@ms>, Matthias Samwald 
<samwald@gmx.at> writes
>
>Rather than trying to do a rapid expansion over the whole web through 
>very light-weight, loose RDFization of all kinds of data, it might be 
>more rewarding to focus on creating rich, relatively consistent and 
>interoperable RDF/OWL representations of the information resources that 
>matter the most. Of course, this is not an either-or decision, as both 
>processes (the improvement in quality and the increase in quantity) 
>will happen in parallel. But I think that quality should have higher 
>priority than quantity, even if it might be harder to, uhm, quantify quality.

This is the sort of issue I am trying to get my head around in relation 
to my particular area of interest: the museums community.  I'm trying to 
form a view on what museum collections information systems could 
contribute to the Linked Data effort, and my current thinking is 
"objects in a historical context".

I've had a go at putting up one museum's 60,000 objects as 
not-very-linked-data, see e.g.:

http://collections.wordsworth.org.uk/object/rdf/GRMDC.C104.15

which gives an idea of the sort of information that might be present 
(for Fine Art materials, anyway).

One no-brainer is that this sort of exercise allows museums to assign 
persistent URIs to their own objects, as I have done here.

Another obvious conclusion is that the museum community ought to get its 
act together and agree on a vocabulary/ontology for the predicates in 
these object descriptions. I'm currently using DBpedia properties, but 
there are frameworks like the CIDOC Conceptual Reference Model which 
might serve better.

After that it all gets a bit hazy.

I've made a hook-up to Geonames if there is an "exact match" on the 
place name in the data (which is done dynamically in the XSLT transform 
which generates the RDF).  I could, in principle, go to resources like 
the Getty AAT for techniques, etc., as and when it has an API which 
allows me to query it and get XML back.

However, my biggest query is about people - in a museum/historical 
context, you're talking about all the people who ever lived, whether 
famous or not.  I could invent URIs for each person mentioned in the 
Wordsworth Trust data, and publish those, but then they would be locked 
into a single silo with no prospect of interoperability with any other 
museum's personal data.  Mapping names across thousands of museum triple 
stores is not a scalable option.

So ... is there a case for "deadpeople.org", a site which does for 
historical people what Geonames does for place names?  ("dead" = "no 
data protection issues": I'm not just being macabre.)  The site should 
expect a constant flood of new people (and should issue a unique URI for 
each as it creates the central record), but should also allow queries 
against existing entries, so that the matching process can happen on a 
case-by-case basis in a central place, rather than being done after the 
event.

Richard Light

-- 
Richard Light

Received on Thursday, 20 November 2008 12:28:46 UTC