- From: Nick Matsakis <matsakis@mit.edu>
- Date: Mon, 27 Oct 2003 14:29:39 -0500 (EST)
- To: Kevin Smathers <kevin.smathers@hp.com>
- Cc: SIMILE public list <www-rdf-dspace@w3.org>
On Mon, 27 Oct 2003, Kevin Smathers wrote: NM> My second requirement for ingestion software is that any record NM> linkage it does, including name canonicalization, err on the side of nm> caution. ... Of these, linking distinct entities is the more grave, NM> for reasons I hope are obvious. > Not sure I agree here. When performing a search it is usually better to > get back extra information that wasn't requested than to miss data that > was requested. A user can usually quickly sort out records that don't > apply, so as long as the extra data is within a small fraction of the > targeted data there is at least something to work with. First, I am not suggesting that we never attempt a linkage that may result in an incorrect match, but rather that such linkages never happen in software that is simply intended to translate one metadata format to RDF. In my terminology, ingesting should be simple, digesting can be complex. The idea here is that we're going to need to write custom software for each format that we want to import, but it would be nice if the same frameworks could be used for identifying duplicates and translating schema once the data is in RDF. On the matter of whether false matches are worse than false misses, I still think this is the case. If you give two distinct resources the same URI, the result isn't that a user will get an irrelevant record as a result of a search but rather that a user will get a relevant record with incorrect information. This seems worse to me than getting back two relevant records. Nick
Received on Monday, 27 October 2003 14:29:47 UTC