- From: Simon Spero <ses@unc.edu>
- Date: Wed, 23 Mar 2011 18:39:07 -0400
- To: public-lld <public-lld@w3.org>
- Message-ID: <AANLkTinZJAhEVuK6VCd41fy_NQeRwA950euoiLMUPoB6@mail.gmail.com>
A few brief notes: 1. Assigning identifiers that are guaranteed not to have been assigned by another agent does not require a centralized repository of all identifiers; it merely requires a partitioning of the name space such that no part of the name space is delegated to more than one agent. 2. Worldcat contains a great deal of duplication, where multiple records exist for the same manifestation. Usually these records will be brought into the same work-set, but sometimes a record is so badly constructed that a work-set will be split such that an ISBN corresponds to items in multiple work-sets. I first encountered this the first time I tried using the experimental XISBN service; the item I scanned the ISBN from was a paperback edition of... the AACR2. Worldcat record numbers do not satsify the Unique Name Assumption. 3. The economic model used for Worldcat has the unfortunate side effect of encouraging the proliferation of records that are sufficiently bad that they avoid de-duplication. Usually the best record dominates the holdings count, but there are still enough careless people who download the incorrect records to make purging them (and merging them in with the correct record) problematic. 4. Because there is no automatic way of synchronizing and pushing corrections from OCLC to member libraries with incorrect holdings, these errors can make it harder to make optimal choices when selecting targets for ILL requests. 5. The amount of bibliographic information is relatively small compared to that processed in data intensive scientific disciplines. ~7M LC bibliographic records on average take up less 180 bytes of disk space each when bzip2 compressed. 200M records could fit comfortably on a smartphone flash card. This number of bits is easy to massively replicate. 6. For most normal people, the reason for wanting a bibliographic record to add to a catalog is because they have a copy of the item that they wish to process and put on the shelf. For non-unique items, the probability of being the first to acquire a copy of a new manifestation is rather low (unless you're LC) . However, the first records produced,especially if they're ONIX derived, are typically of rather poor quality. A model of sharing that requests (but not requires) one to re-enter a value for a field chosen based on how likely it is that there may be an error in that field may be more likely to asymptotically approach awesome than a model that treats entire records as valuable objects in their own right, to be bought and sold as chattels. 7. The advantage of a distributed system with local replicas as opposed to a centralized system with one-time downloads becomes especially apparent as records undergo continuous improvement. This is where systems like worldcat local and its successors can shine.
Received on Wednesday, 23 March 2011 22:39:40 UTC