W3C home > Mailing lists > Public > public-lld@w3.org > March 2011

Re: Fwd: Re: Question about MARCXML to Models transformation

From: Richard Light <richard@light.demon.co.uk>
Date: Wed, 9 Mar 2011 13:04:04 +0000
Message-ID: <whvOdG0Er3dNFwzH@light.demon.co.uk>
To: Antoine Isaac <aisaac@few.vu.nl>
Cc: public-lld <public-lld@w3.org>
In message <4D776FDB.2060500@few.vu.nl>, Antoine Isaac 
<aisaac@few.vu.nl> writes
>>
>>> But I'm really wondering why 1 would not be possible for quite
>>> easily identifiable entities like places and persons. With some
>>> basic tools that use existing linked data sources like Geonames, you
>>> could easily get something like a "did you mean
>>> http://sws.geonames.org/2988507/?" question (with a better
>>> interface, of course) that a cataloger can answer by yes or no, when
>>> "Paris" is filled in as place of publication.
>>
>> This "added" data is really the equivalent of the coded values in
>> MARC, in my mind. Where possible, systems need to create short-cuts so
>> that catalogers do not have to fill in both the text value and the
>> coded value (most of what is coded in MARC is redundant with data in
>> the textual fields). We need to make it so that catalogers have to do
>> *less* not *more* if we wish to get them on board. That's only fair.
>
>Good points. Perhaps that could be also something interesting to 
>mention in the report, in recommendations on how to change (if 
>possible) the way library data could be created or processed. So as to 
>make sure that the original work of librarians has maximum impact in a 
>more open environment...

Similar issues arise in a museum context. One aspect of the problem when 
we are trying to convert string data to URLs is that we have a different 
sort of context from the running text which e.g. dbpedia Spotlight can 
annotate using NLP techniques.  However, this should, in principle, make 
life easier, since the data is of a known type.

Following the recent Culture Grid Hack Day [1] I've written a simple CGI 
for place names [2] which will attempt to disambiguate strings such as:

Paris, France

so that:

http://light.demon.co.uk/scripts/getPlaceURL.exe?q=Paris,%20France

returns the XML:

<result q="Paris, France" q1="Paris" q2="France" country="FR" 
url="http://api.geonames.org/search?style=short&name_equals=Paris&country
=FR&username=demo" hits="5" geonameId="2988507" 
hierUrl="http://api.geonames.org/hierarchy?geonameId=2988507&username=dem
o" hit1="true" hit2="true" 
certainty="100">http://www.geonames.org/2988507/</result>

However, if you just give it "Paris" there is no guarantee it will be 
able to help you. (It does, but it shouldn't!)

Lightweight URL-ifier tools like this (I have in mind one for dates and 
date ranges) may enable cataloguers to include URLs for concepts at 
relatively low cost.

Richard

[1] http://www.culturegridhackday.org.uk/
[2] http://light.demon.co.uk/wordpress/?p=54

-- 
Richard Light
Received on Wednesday, 9 March 2011 13:05:55 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 9 March 2011 13:05:56 GMT