Name canonicalization

Hi all,

I've been working on name canonicalization in the OCW repository, and 
wrote a script today that handles all of the names that are found there 
(either by pattern, or by using a short list of exceptions), and am 
starting to implement a web service interface to ask the OCLC for a URI 
for each name. 

Given the work I've already done on the OCLC web service, the results of 
the request are going to be a set of possible matches including
phrase matches and word matches.  I'm thinking of turning these into a 
graph that looks like this for some IMS record <R>, some person <V>, and 
a set of OCLC results <A1>, <B1>...<B2>.

<A> dc:title "Something"
    loc-life:author [
          <V> vc:FN "Smith, John";
              oclc:probablyMatches <A1>;
              oclc:possiblyMatches <B1>;
              oclc:possiblyMatches <B2>
     ] .
                     
 
Does this seem like a reasonable model for this relationship?   How do 
we match this up with the ArtStor records -- I imagine they should be 
matched by URI, with a similar linkage from each ArtStor record.

Cheers,
-kls          

-- 
========================================================
   Kevin Smathers                kevin.smathers@hp.com    
   Hewlett-Packard               kevin@ank.com            
   Palo Alto Research Lab                                 
   1501 Page Mill Rd.            650-857-4477 work        
   M/S 1135                      650-852-8186 fax         
   Palo Alto, CA 94304           510-247-1031 home        
========================================================
use "Standard::Disclaimer";
carp("This message was printed on 100% recycled bits.");

Received on Thursday, 30 October 2003 18:25:26 UTC