RE: Record Linkage in Simile (also canonicalizing names in SIMILE )

Nick, 

To save you some effort using the CVS, I've used a similiar technique to
Kevin's to extract all the personal names from the Artstor data. So here are
the extracted names from the Artstor and OCW corpori - this will give you an
idea of the kind of data we need to deal with in the demo. 

There are a number of examples of record linkage in the OCW data set e.g.
Dower, John W
John Dower
Tania Baker
Tania A. Baker
Miyagawa, Shigeru 
Shigeru Miyagawa
Prof. Shigeru Miyagawa
Peters, W. T.
Peters, W.T.

but I can only spot one linkage between the two datasets 
OCW: Goya, Francisco, 1746-1828.
Artstor: Goya, Francisco,1746-1828

Dr Mark H. Butler
Research Scientist                HP Labs Bristol
mark-h_butler@hp.com
Internet: http://www-uk.hpl.hp.com/people/marbut/

Received on Monday, 27 October 2003 07:56:47 UTC