- From: Rachel Heery <r.heery@ukoln.ac.uk>
- Date: Tue, 28 Oct 2003 12:52:18 +0000 (GMT)
- To: "Butler, Mark" <Mark_Butler@hplb.hpl.hp.com>
- Cc: SIMILE public list <www-rdf-dspace@w3.org>
On Mon, 27 Oct 2003, Butler, Mark wrote: > > Are you aware of any literature on name canonicalization? It's just its such > a common problem, people have been trying to integrate disparate databases > since the 70's so its' possible someone has published a survey paper on > this? I did a quick search this morning, but I'm guessing they may have used > another term apart from name canonicalization. I think the phrase would be 'name authority matching' to pick up more library oriented literature. A very quick search shows up the following, nothing with lots of references though :-( D-Lib Magazine April 2001 Volume 7 Number 4 Tim DiLauro et al. Automated Name Authority Control and Enhanced Searching in the Levy Collection http://www.dlib.org/dlib/april01/dilauro/04dilauro.html In ACM Digital Library: Automated name authority control James W. Warnner, Elizabeth W. Brown January 2001 Proceedings of the first ACM/IEEE-CS joint conference on Digital libraries I recall ( a long time ago) when I was involved in library data conversions the algorithms for matching used various abbreviated string matching, followed by matching 'known duplicate works'... a bit like the matching algorithm mentioned in a page that emerged from Google http://www.indiana.edu/~libtserv/staff/retro/dissauth.html <quote> (4,3,1 = first four letters of the author's last name, first three letters of the author's first name, first letter of the middle name) etc etc </quote> There is the LEAF project http://www.crxnet.com/leaf/ and OCLC of course Virtual International Authority File http://www.oclc.org/research/projects/viaf/default.htm Rachel --------------------------------------------------------------------------- Rachel Heery UKOLN University of Bath tel: +44 (0)1225 826724 Bath, BA2 7AY, UK fax: +44 (0)1225 826838 http://www.ukoln.ac.uk/
Received on Tuesday, 28 October 2003 07:52:21 UTC