- From: Nick Matsakis <matsakis@mit.edu>
- Date: Tue, 28 Oct 2003 14:12:25 -0500 (EST)
- To: "Butler, Mark" <Mark_Butler@hplb.hpl.hp.com>
- Cc: SIMILE public list <www-rdf-dspace@w3.org>
On Mon, 27 Oct 2003, Butler, Mark wrote: > the FRBR ... specification, which doesn't just represent the collection > (...group 1 entities) but also people and organizations who are related > to the collection (group 2 entities) and concepts that are related to > the collection (group 3 entities). ... Now mappings between ... two > schemas ... may not be helpful if we want to retrieve all the resources > relating to a particular artist. The only way to do this is to do record > linking between group 1, 2 and 3 entities. This made me realize that I should clarify my use of the term "record linkage". This term has a long history in the database community for the problem of "linking" records from different databases that represent identical entities. Classic examples of this are matching up census records from different files or matching medical records of patients from different hospital databases. There are a plethora of terms for this problem that vary with the community and the data types. I've seen it called citation matching, data cleaning/hardening, reference matching, authority work, and so forth. In all of these, though, the entities being linked or matched are, in a strong sense, identical: 'Nick Matsakis' in source A is the same real world person as 'N. Matsakis' in source B. This is why I'm concerned we're not talking about the same thing when you refer to linking between collections, people, and concepts. (groups 1, 2, and 3 in FRBR). The person 'Nick Matsakis' is not the same as a document he authored. Of course, in the classical formulations the schemas are fixed. When merging data from multiple schemas, there may be cases of mismatch. For example, the FRBR ontology is richer than that currently used by libraries, and so a record for a book in a library may (or may not) be matched with FRBR item, manifestation, expression, or work records, depending on the application. It is for this reason that I also think it is good to think of "record linking" as a separate problem for "record merging", which is what you do with the knowledge that two records are "the same". Nick
Received on Tuesday, 28 October 2003 14:12:31 UTC