W3C home > Mailing lists > Public > public-lld@w3.org > March 2011

Re: Fwd: Re: Question about MARCXML to Models transformation

From: Richard Light <richard@light.demon.co.uk>
Date: Wed, 9 Mar 2011 15:00:07 +0000
Message-ID: <0fjvSl33X5dNFwQL@light.demon.co.uk>
To: Karen Coyle <kcoyle@kcoyle.net>
Cc: Antoine Isaac <aisaac@few.vu.nl>, public-lld <public-lld@w3.org>
In message <20110309064234.101571g10sooyk0q@kcoyle.net>, Karen Coyle 
<kcoyle@kcoyle.net> writes
>Great stuff, Richard, thanks. A few more of these and maybe we can 
>develop a very rudimentary demo of what cataloging might look like at 
>some time in the future.

Thanks.  I've now updated it so that it returns a low certainty value if 
there is only one search term.  Also, the Geonames "demo" user has now 
had its allocation for today, so you'll need to get your own Geonames 
username if you want to play with it (see blog [2] for details).

If anyone wants the CGI (a Windows executable) to install on their own 
system, just ask.  No guarantees (about fitness for purpose etc.), 
though!

Richard

>Quoting Richard Light <richard@light.demon.co.uk>:
>
>> In message <4D776FDB.2060500@few.vu.nl>, Antoine Isaac 
>><aisaac@few.vu.nl> writes
>>>>
>>>>> But I'm really wondering why 1 would not be possible for quite
>>>>> easily identifiable entities like places and persons. With some
>>>>> basic tools that use existing linked data sources like Geonames, you
>>>>> could easily get something like a "did you mean
>>>>> http://sws.geonames.org/2988507/?" question (with a better
>>>>> interface, of course) that a cataloger can answer by yes or no, when
>>>>> "Paris" is filled in as place of publication.
>>>>
>>>> This "added" data is really the equivalent of the coded values in
>>>> MARC, in my mind. Where possible, systems need to create short-cuts so
>>>> that catalogers do not have to fill in both the text value and the
>>>> coded value (most of what is coded in MARC is redundant with data in
>>>> the textual fields). We need to make it so that catalogers have to do
>>>> *less* not *more* if we wish to get them on board. That's only fair.
>>>
>>> Good points. Perhaps that could be also something interesting to 
>>>mention in the report, in recommendations on how to change (if 
>>>possible) the way library data could be created or processed. So as 
>>>to make sure that the original work of librarians has maximum 
>>>impact in a more open environment...
>>
>> Similar issues arise in a museum context. One aspect of the problem 
>>when we are trying to convert string data to URLs is that we have a 
>>different sort of context from the running text which e.g. dbpedia 
>>Spotlight can annotate using NLP techniques.  However, this should, in 
>>principle, make life easier, since the data is of a known type.
>>
>> Following the recent Culture Grid Hack Day [1] I've written a simple 
>>CGI for place names [2] which will attempt to disambiguate strings 
>>such as:
>>
>> Paris, France
>>
>> so that:
>>
>> http://light.demon.co.uk/scripts/getPlaceURL.exe?q=Paris,%20France
>>
>> returns the XML:
>>
>> <result q="Paris, France" q1="Paris" q2="France" country="FR" 
>>url="http://api.geonames.org/search?style=short&name_equals=Paris&count>>ry
>> =FR&username=demo" hits="5" geonameId="2988507"
>> 
>>hierUrl="http://api.geonames.org/hierarchy?geonameId=2988507&username=d>>em
>> o" hit1="true" hit2="true" 
>>certainty="100">http://www.geonames.org/2988507/</result>
>>
>> However, if you just give it "Paris" there is no guarantee it will 
>>be able to help you. (It does, but it shouldn't!)
>>
>> Lightweight URL-ifier tools like this (I have in mind one for dates 
>>and date ranges) may enable cataloguers to include URLs for concepts 
>>relatively low cost.
>>
>> Richard
>>
>> [1] http://www.culturegridhackday.org.uk/
>> [2] http://light.demon.co.uk/wordpress/?p=54
>>
>> --  Richard Light
>>
>>
>
>
>

-- 
Richard Light
Received on Wednesday, 9 March 2011 15:01:23 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 9 March 2011 15:01:26 GMT