Re: SV: native encoding from Rustam T. Usmanov on 2002-03-02 (www-zig@w3.org from March 2002)

From: Rustam T. Usmanov <rustam@unilib.neva.ru>
Date: Sat, 2 Mar 2002 14:34:41 +0300 (MSK)
To: www-zig@w3.org
Message-ID: <Pine.ULT.3.96.1020302135552.15966B-100000@STPULX.unilib.neva.ru>

On Fri, 1 Mar 2002, Henrik Dahl wrote:

> I've got a question in scope of the characterset consideration which some of
> you have regarding MARC records. In danMARC2 the value of a subfield
> containing the title may look like this:
> 
> "The <alphabetizationcharacter>ugly duckling"
> 
> In this way we know that the "The ugly duckling" should be e.g. sorted as
> "ugly duckling". The <alphabetizationcharacter> is a certain byte which has
> just been chosen for this prosperous role.
> 
> Isn't it correct, that conversion to some general characterset as pure
> UTF-8, UNICODE or the like will simply force the <alphabetizationcharacter>
> to go away, as such a character just doesn't exist in neither UTF-8 nor in
> UNICODE, making "The ugly duckling" simply e.g. sorted as "The ugly
> duckling", which no librarian in e.g. Denmark will accept.

It depends on convertor, input character set, output character set and
that <alphabetizationcharacter> byte value. In UNIMARC this thing is
called "non-sorting characters" which are defined by ISO 6630 and stay
invariable regarding transformation to/from UNICODE. Note that UTF-8 is
the form of UNICODE defined character existence.

--
Rustam Usmanov, systems engineer
Open Library Systems Center, St.Petersburg State Technical University
Address:  29, Politekhnitcheskaya str., St.Petersburg, 195251, Russia
Tel/fax: +7 812 552 7654        <URL:http://www.unilib.neva.ru/olsc/>

Received on Saturday, 2 March 2002 06:34:42 UTC