RE: Record Linkage in Simile

On Mon, 27 Oct 2003, Butler, Mark wrote:

>
> Are you aware of any literature on name canonicalization? It's just its such
> a common problem, people have been trying to integrate disparate databases
> since the 70's so its' possible someone has published a survey paper on
> this? I did a quick search this morning, but I'm guessing they may have used
> another term apart from name canonicalization.

I think the phrase would be 'name authority matching' to pick up more
library oriented literature. A very quick search shows up the following,
nothing with lots of references though :-(

D-Lib Magazine April 2001 Volume 7 Number 4
Tim DiLauro et al.
Automated Name Authority Control and Enhanced Searching in the Levy
Collection
http://www.dlib.org/dlib/april01/dilauro/04dilauro.html

In ACM Digital Library:
Automated name authority control James W. Warnner, Elizabeth W.
Brown
January 2001 Proceedings of the first ACM/IEEE-CS joint
conference on Digital libraries

I recall ( a long time ago) when I was involved in library data
conversions the algorithms for matching used various abbreviated
string matching, followed by matching 'known duplicate works'... a bit
like the matching algorithm mentioned in a page that emerged from
Google
http://www.indiana.edu/~libtserv/staff/retro/dissauth.html
<quote>
(4,3,1 = first four letters of the author's
last name, first three letters of the author's first name, first letter of
the middle name) etc etc
</quote>

There is the LEAF project
http://www.crxnet.com/leaf/

and OCLC of course
Virtual International Authority File
http://www.oclc.org/research/projects/viaf/default.htm

Rachel

---------------------------------------------------------------------------
Rachel Heery
UKOLN
University of Bath                              tel: +44 (0)1225 826724
Bath, BA2 7AY, UK                               fax: +44 (0)1225 826838
http://www.ukoln.ac.uk/

Received on Tuesday, 28 October 2003 07:52:21 UTC