- From: Rustam T. Usmanov <rustam@unilib.neva.ru>
- Date: Sat, 2 Mar 2002 14:34:41 +0300 (MSK)
- To: www-zig@w3.org
On Fri, 1 Mar 2002, Henrik Dahl wrote: > I've got a question in scope of the characterset consideration which some of > you have regarding MARC records. In danMARC2 the value of a subfield > containing the title may look like this: > > "The <alphabetizationcharacter>ugly duckling" > > In this way we know that the "The ugly duckling" should be e.g. sorted as > "ugly duckling". The <alphabetizationcharacter> is a certain byte which has > just been chosen for this prosperous role. > > Isn't it correct, that conversion to some general characterset as pure > UTF-8, UNICODE or the like will simply force the <alphabetizationcharacter> > to go away, as such a character just doesn't exist in neither UTF-8 nor in > UNICODE, making "The ugly duckling" simply e.g. sorted as "The ugly > duckling", which no librarian in e.g. Denmark will accept. It depends on convertor, input character set, output character set and that <alphabetizationcharacter> byte value. In UNIMARC this thing is called "non-sorting characters" which are defined by ISO 6630 and stay invariable regarding transformation to/from UNICODE. Note that UTF-8 is the form of UNICODE defined character existence. -- Rustam Usmanov, systems engineer Open Library Systems Center, St.Petersburg State Technical University Address: 29, Politekhnitcheskaya str., St.Petersburg, 195251, Russia Tel/fax: +7 812 552 7654 <URL:http://www.unilib.neva.ru/olsc/>
Received on Saturday, 2 March 2002 06:34:42 UTC