Re: Z39.50 character encoding from Johan Zeeman on 2002-02-28 (www-zig@w3.org from February 2002)

From: Johan Zeeman <joe.zeeman@tlcdelivers.com>
Date: Thu, 28 Feb 2002 09:13:20 -0500
To: <www-zig@w3.org>
Message-ID: <008101c1c062$139ae1d0$9539910c@unicity.tlcdelivers.com>

----- Original Message -----
From: "Liv Aasa Holm" <Liv.A.Holm@jbi.hio.no>
To: <www-zig@w3.org>
Sent: Thursday, February 28, 2002 1:59 AM
Subject: RE: Z39.50 character encoding

> Most MARC formats do NOT specify a character set.  Does DC?  I have at
least
> not seen it, but perhaps it is implisit?

Which makes "most" MARCs even more broken than MARC21, with which at least
you know what the character set is.  Otherwise you have to intuit the
character set, and computers have not generally been praised for their
intuitive powers.  UNIMARC certainly specifics character sets in
considerable detail (basically, unless the record specifies something else
using ISO 2022 mechanisms, it's ASCII [ISO 646 IRV, actually]).  In fact, I
suspect that the assertion that "most" MARC formats do not specify a
character set is incorrect.  They may specify one by reference to some other
MARC, but they will specify a character set.  It may also be that ISO 2709
specifies a default character set--I don't have the standard to hand to
check.

DC by itself is not a record syntax; it is a list of data elements.  To be a
record syntax, the data elements need to be encoded using some scheme.  The
one I know about is XML.  And XML explicitly uses UTF-8.

j.

>
> Liv
>
> ===== Original Message from Ray Denenberg <rden@loc.gov> at 27.02.02 19:12
> >Mike Taylor wrote:
> >
> >> Some kinds of object (e.g. USMARC) specify a character set, and
> >> others (GRS-1) do not.  Those which do, we must respect.
> >
> >True, some do and some don't.
> >
> >Two questions we need to answer before we go much further (and I think we
> >need help from the experts on these):
> >
> >(1) Is is clear exactly which do and which don't?
> >(2) For those which "do", is it always the case that these will be
> >transfered according to the native character encoding or is it likely
that
> >clients will want  records in utf-8, even in the case where the format
> >specifies a native encoding?
> >
> >And I think (1) is the more important question. We can address (2) later.
> >
> >In other words for any given format, is it always implicitly known to
both
> >parties (client and server) whether or not the format comes with a native
> >encoding.  If so then our problem is simplified. But if not, then I'm
afraid
> >Mike's philosophy "Those which do, we must respect" isn't going to work
in
> >practice.
> >
> >--Ray
> ===== Comments by Liv.A.Holm@jbi.hio.no (Liv Aasa Holm) at 28.02.02 07:58
>
> *******************************************************
> Liv A. Holm
> associate professor
> Oslo University college
> faculty of journalism, library and information science
> tel. +47-22-45-27-77
> fax.:+47-22-45-26-05
> *******************************************************

Received on Thursday, 28 February 2002 09:14:33 UTC