Re: Progress with ArtStor

>I've also taken a brief look at the ArtStor corpus.  Nothing by Frank 
>Lloyd Wright, nor I. M. Pei in their data, so I doubt that architecture is 
>in the corpus.

If you'd like to know what's in this dataset I would take a look at their 
website
http://www.artstor.org/collections/brief.jsp. Just offhand I'd think that the
MoMA Architecture and Design collection contains some architectural records,
unless those were mysteriously left out of our sample.

>May I return to my suggestion that we should not try to canonicalize 
>Person records, but instead use them as we find them?

It is certainly true that normalizing personal names is on the road to madness.
As Mark pointed out earlier, there have been many past attempts to do this,
called name authority control, and it's hard, expensive, requires human
judgement, is subject to many exceptions, and that's why it's so bloody
expensive. I would try to do it, myself.

Don't forget that part of this project is to work with OCLC to use their brand
new Web Service for name authority control. They even have a working
prototype up that is part of the submission process and very nifty. It only
fails if there are no matches in LCNAF or other national authority files, and
we're thinking about how to fix that for local authors (typical in DSpace) and
other name lists (like ULAN). They're very good at this, so let's let them do
the work :-)

MacKenzie/



MacKenzie Smith
Associate Director for Technology
MIT Libraries
Building 14S-208
77 Massachusetts Avenue
Cambridge, MA  02139
(617)253-8184
kenzie@mit.edu 

Received on Wednesday, 15 October 2003 18:23:01 UTC