RE: Record Linkage in Simile

On Mon, 27 Oct 2003, Butler, Mark wrote:

> the FRBR ... specification, which doesn't just represent the collection
> (...group 1 entities) but also people and organizations who are related
> to the collection (group 2 entities) and concepts that are related to
> the collection (group 3 entities). ...  Now mappings between ... two
> schemas ... may not be helpful if we want to retrieve all the resources
> relating to a particular artist. The only way to do this is to do record
> linking between group 1, 2 and 3 entities.

This made me realize that I should clarify my use of the term "record
linkage".  This term has a long history in the database community for the
problem of "linking" records from different databases that represent
identical entities. Classic examples of this are matching up census
records from different files or matching medical records of patients from
different hospital databases.

There are a plethora of terms for this problem that vary with the
community and the data types. I've seen it called citation matching, data
cleaning/hardening, reference matching, authority work, and so forth. In
all of these, though, the entities being linked or matched are, in a
strong sense, identical: 'Nick Matsakis' in source A is the same real
world person as 'N. Matsakis' in source B.  This is why I'm concerned
we're not talking about the same thing when you refer to linking between
collections, people, and concepts. (groups 1, 2, and 3 in FRBR). The
person 'Nick Matsakis' is not the same as a document he authored.

Of course, in the classical formulations the schemas are fixed. When
merging data from multiple schemas, there may be cases of mismatch.  For
example, the FRBR ontology is richer than that currently used by
libraries, and so a record for a book in a library may (or may not) be
matched with FRBR item, manifestation, expression, or work records,
depending on the application.  It is for this reason that I also think it
is good to think of "record linking" as a separate problem for "record
merging", which is what you do with the knowledge that two records are
"the same".

Nick

Received on Tuesday, 28 October 2003 14:12:31 UTC