W3C home > Mailing lists > Public > www-rdf-dspace@w3.org > October 2003

RE: Record Linkage in Simile (also canonicalizing names in SIMILE )

From: Butler, Mark <Mark_Butler@hplb.hpl.hp.com>
Date: Mon, 27 Oct 2003 12:51:27 -0000
Message-ID: <E864E95CB35C1C46B72FEA0626A2E808206211@0-mail-br1.hpl.hp.com>
To: "'Nick Matsakis'" <matsakis@mit.edu>, SIMILE public list <www-rdf-dspace@w3.org>
Nick, 

To save you some effort using the CVS, I've used a similiar technique to
Kevin's to extract all the personal names from the Artstor data. So here are
the extracted names from the Artstor and OCW corpori - this will give you an
idea of the kind of data we need to deal with in the demo. 

There are a number of examples of record linkage in the OCW data set e.g.
Dower, John W
John Dower
Tania Baker
Tania A. Baker
Miyagawa, Shigeru 
Shigeru Miyagawa
Prof. Shigeru Miyagawa
Peters, W. T.
Peters, W.T.

but I can only spot one linkage between the two datasets 
OCW: Goya, Francisco, 1746-1828.
Artstor: Goya, Francisco,1746-1828

Dr Mark H. Butler
Research Scientist                HP Labs Bristol
mark-h_butler@hp.com
Internet: http://www-uk.hpl.hp.com/people/marbut/



Received on Monday, 27 October 2003 07:56:47 EST

This archive was generated by hypermail pre-2.1.9 : Monday, 27 October 2003 07:57:04 EST