- From: Liv Aasa Holm <Liv.A.Holm@jbi.hio.no>
- Date: Fri, 1 Mar 2002 08:12:26 +0100
- To: www-zig@w3.org
You are right in one thing: we have agreed upon character sets OUTSIDE the MARC formats. So when we convert between the different MARC formats we usually know which characterset is used. But I think is is a major drawback of formats like MARC21 (or the former USMARC) that it specifies a character set AND that it is not really a valid MARC21 record unless it is in this specific character set. The format, i.e. fields and subfields, should not be tied to a character set. DC is not only used with XML. And, yes, it is a format. Liv ===== Original Message from "Johan Zeeman" <joe.zeeman@tlcdelivers.com> at 28.02.02 15:13 >----- Original Message ----- >From: "Liv Aasa Holm" <Liv.A.Holm@jbi.hio.no> >To: <www-zig@w3.org> >Sent: Thursday, February 28, 2002 1:59 AM >Subject: RE: Z39.50 character encoding > > >> Most MARC formats do NOT specify a character set. Does DC? I have at >least >> not seen it, but perhaps it is implisit? > >Which makes "most" MARCs even more broken than MARC21, with which at least >you know what the character set is. Otherwise you have to intuit the >character set, and computers have not generally been praised for their >intuitive powers. UNIMARC certainly specifics character sets in >considerable detail (basically, unless the record specifies something else >using ISO 2022 mechanisms, it's ASCII [ISO 646 IRV, actually]). In fact, I >suspect that the assertion that "most" MARC formats do not specify a >character set is incorrect. They may specify one by reference to some other >MARC, but they will specify a character set. It may also be that ISO 2709 >specifies a default character set--I don't have the standard to hand to >check. > >DC by itself is not a record syntax; it is a list of data elements. To be a >record syntax, the data elements need to be encoded using some scheme. The >one I know about is XML. And XML explicitly uses UTF-8. > >j. > >> >> Liv >> >> ===== Original Message from Ray Denenberg <rden@loc.gov> at 27.02.02 19:12 >> >Mike Taylor wrote: >> > >> >> Some kinds of object (e.g. USMARC) specify a character set, and >> >> others (GRS-1) do not. Those which do, we must respect. >> > >> >True, some do and some don't. >> > >> >Two questions we need to answer before we go much further (and I think we >> >need help from the experts on these): >> > >> >(1) Is is clear exactly which do and which don't? >> >(2) For those which "do", is it always the case that these will be >> >transfered according to the native character encoding or is it likely >that >> >clients will want records in utf-8, even in the case where the format >> >specifies a native encoding? >> > >> >And I think (1) is the more important question. We can address (2) later. >> > >> >In other words for any given format, is it always implicitly known to >both >> >parties (client and server) whether or not the format comes with a native >> >encoding. If so then our problem is simplified. But if not, then I'm >afraid >> >Mike's philosophy "Those which do, we must respect" isn't going to work >in >> >practice. >> > >> >--Ray >> ===== Comments by Liv.A.Holm@jbi.hio.no (Liv Aasa Holm) at 28.02.02 07:58 >> >> ******************************************************* >> Liv A. Holm >> associate professor >> Oslo University college >> faculty of journalism, library and information science >> tel. +47-22-45-27-77 >> fax.:+47-22-45-26-05 >> ******************************************************* ===== Comments by Liv.A.Holm@jbi.hio.no (Liv Aasa Holm) at 01.03.02 08:08 ******************************************************* Liv A. Holm associate professor Oslo University college faculty of journalism, library and information science tel. +47-22-45-27-77 fax.:+47-22-45-26-05 *******************************************************
Received on Friday, 1 March 2002 02:12:29 UTC