RE: Z39.50 character encoding from Liv Aasa Holm on 2002-03-01 (www-zig@w3.org from March 2002)

From: Liv Aasa Holm <Liv.A.Holm@jbi.hio.no>
Date: Fri, 1 Mar 2002 08:12:26 +0100
To: www-zig@w3.org
Message-ID: <3C7F29DC@p48-r508-1>
You are right in one thing: we have agreed upon character sets OUTSIDE the 
MARC formats.  So when we convert between the different MARC formats we 
usually know which characterset is used.  But I think is is a major drawback 
of formats like MARC21 (or the former USMARC) that it specifies a character 
set AND that it is not really a valid MARC21 record unless it is in this 
specific character set.  The format, i.e. fields and subfields, should not 
be tied to a character set.

DC is not only used with XML.  And, yes, it is a format.

Liv

===== Original Message from "Johan Zeeman" <joe.zeeman@tlcdelivers.com> at 
28.02.02 15:13
>----- Original Message -----
>From: "Liv Aasa Holm" <Liv.A.Holm@jbi.hio.no>
>To: <www-zig@w3.org>
>Sent: Thursday, February 28, 2002 1:59 AM
>Subject: RE: Z39.50 character encoding
>
>
>> Most MARC formats do NOT specify a character set.  Does DC?  I have at
>least
>> not seen it, but perhaps it is implisit?
>
>Which makes "most" MARCs even more broken than MARC21, with which at least
>you know what the character set is.  Otherwise you have to intuit the
>character set, and computers have not generally been praised for their
>intuitive powers.  UNIMARC certainly specifics character sets in
>considerable detail (basically, unless the record specifies something else
>using ISO 2022 mechanisms, it's ASCII [ISO 646 IRV, actually]).  In fact, I
>suspect that the assertion that "most" MARC formats do not specify a
>character set is incorrect.  They may specify one by reference to some other
>MARC, but they will specify a character set.  It may also be that ISO 2709
>specifies a default character set--I don't have the standard to hand to
>check.
>
>DC by itself is not a record syntax; it is a list of data elements.  To be a
>record syntax, the data elements need to be encoded using some scheme.  The
>one I know about is XML.  And XML explicitly uses UTF-8.
>
>j.
>
>>
>> Liv
>>
>> ===== Original Message from Ray Denenberg <rden@loc.gov> at 27.02.02 19:12
>> >Mike Taylor wrote:
>> >
>> >> Some kinds of object (e.g. USMARC) specify a character set, and
>> >> others (GRS-1) do not.  Those which do, we must respect.
>> >
>> >True, some do and some don't.
>> >
>> >Two questions we need to answer before we go much further (and I think we
>> >need help from the experts on these):
>> >
>> >(1) Is is clear exactly which do and which don't?
>> >(2) For those which "do", is it always the case that these will be
>> >transfered according to the native character encoding or is it likely
>that
>> >clients will want  records in utf-8, even in the case where the format
>> >specifies a native encoding?
>> >
>> >And I think (1) is the more important question. We can address (2) later.
>> >




>> >In other words for any given format, is it always implicitly known to
>both
>> >parties (client and server) whether or not the format comes with a native
>> >encoding.  If so then our problem is simplified. But if not, then I'm
>afraid
>> >Mike's philosophy "Those which do, we must respect" isn't going to work
>in
>> >practice.
>> >
>> >--Ray
>> ===== Comments by Liv.A.Holm@jbi.hio.no (Liv Aasa Holm) at 28.02.02 07:58
>>
>> *******************************************************
>> Liv A. Holm
>> associate professor
>> Oslo University college
>> faculty of journalism, library and information science
>> tel. +47-22-45-27-77
>> fax.:+47-22-45-26-05
>> *******************************************************
===== Comments by Liv.A.Holm@jbi.hio.no (Liv Aasa Holm) at 01.03.02 08:08

*******************************************************
Liv A. Holm
associate professor
Oslo University college
faculty of journalism, library and information science
tel. +47-22-45-27-77
fax.:+47-22-45-26-05
*******************************************************
Received on Friday, 1 March 2002 02:12:29 UTC