Re: Library data is expressed primarily as text strings from Tom Baker on 2011-09-04 (public-xg-lld@w3.org from September 2011)

From: Tom Baker <tbaker@tbaker.de>
Date: Sat, 3 Sep 2011 20:09:21 -0400
To: Karen Coyle <kcoyle@kcoyle.net>
Cc: public-xg-lld@w3.org
Message-ID: <20110904000921.GA15038@julius>
On Sat, Sep 03, 2011 at 01:09:58PM -0700, Karen Coyle wrote:
> Actually, none of the identifiers are coded as "data" (unique
> alphanumeric strings) rather than text. This is an entry for an
> ISBN:
> 
> 020 	__ $a 0375409726 (lg. print)
> 
> The "identifier" is in the same text string as the parenthetical
> phrase. Ditto the LCCN, the OCLC number, and others. So the way it
> is in a MARC record, the ISBN may not be entered as a unique string.
> Only the items in the fixed fields have that characteristic.
> Sometimes there isn't a parenthetical phrase in the 020 field, but
> that's an accident of the particular situation.
> 
> 020 	__ $a 1844134571
> 
> So in fact nearly all of the variable field data in MARC (with a few
> exceptions, but very few) are untyped text strings.

Oy. The text in front of me had been ("BEFORE"):

    Most information in library data is encoded as display-oriented text
    strings. There are a few shared identifiers for resources, such as ISBNs
    for books, but most identification is done with text strings. Some coded
    data fields are used in MARC records, but there is not a clear incentive to
    include these in all records, since most coded data fields are not used in
    library system functions.  
    
    Some data fields, such as authority controlled names and subjects, do have
    their own associated records in separate files.  These records have
    identifiers that could be used to represent those entities in library
    metadata. However, some of the data formats currently used do not support
    the inclusion of these identifiers in existing library records.
    Consequently, a number of current library system do not support their use
    properly.

...which I edited into ("AFTER"):

    Most information in library data is encoded as display-oriented text
    strings. Some of the resource identifiers used in library data are based on
    unique alphanumeric strings, such as ISBNs for books, but most
    identification is done using words and names. Some data fields in MARC
    records are coded uniquely, but there is no clear incentive to include
    these in all records as few of them are used for library-system functions.

    Some data fields, such as authority-controlled names and subjects, have
    associated records in separate files, and these records have identifiers
    that could be used to represent those entities in library metadata;
    however, the data formats in current use do not always support inclusion of
    these identifiers in records, so many of today's library systems do not
    properly support their use.

I'm not sure I understand the significance of Peter's distinction between
"encoding" and "storing"...?

I'm thinking that the re-write of the second block ("Some data fields...")
looks okay...?

But in light of your examples, Karen, neither "BEFORE" nor "AFTER" seem to get
the point of the first block quite right.  Specifically, neither version seems
to capture the point about encoding/storing/expressing ISBNs.

Can you suggest a clarification?  The section in question is [1].

Tom

[1] http://www.w3.org/2005/Incubator/lld/wiki/Draft_issues_page_take2#Library_data_is_expressed_primarily_as_text_strings
Received on Sunday, 4 September 2011 00:10:07 UTC