W3C home > Mailing lists > Public > www-zig@w3.org > March 2002

SV: native encoding

From: Henrik Dahl <hdahl@inet.uni2.dk>
Date: Fri, 1 Mar 2002 17:25:42 +0100
To: <www-zig@w3.org>
Message-ID: <001201c1c13d$bbd74e20$0301a8c0@hdthinkpada22p>

I've got a question in scope of the characterset consideration which some of
you have regarding MARC records. In danMARC2 the value of a subfield
containing the title may look like this:

"The <alphabetizationcharacter>ugly duckling"

In this way we know that the "The ugly duckling" should be e.g. sorted as
"ugly duckling". The <alphabetizationcharacter> is a certain byte which has
just been chosen for this prosperous role.

Isn't it correct, that conversion to some general characterset as pure
UTF-8, UNICODE or the like will simply force the <alphabetizationcharacter>
to go away, as such a character just doesn't exist in neither UTF-8 nor in
UNICODE, making "The ugly duckling" simply e.g. sorted as "The ugly
duckling", which no librarian in e.g. Denmark will accept.

Best regards,

Henrik Dahl

-----Oprindelig meddelelse-----
Fra: www-zig-request@w3.org [mailto:www-zig-request@w3.org]Pa vegne af
Ray Denenberg
Sendt: Friday, March 01, 2002 5:01 PM
Til: www-zig@w3.org
Cc: zig
Emne: Re: native encoding

"LeVan,Ralph" wrote:

> We are going to have to profile (through an implementors agreement) which
> record syntaxes the UTF-8 negotiation applies to.

No, no, no!  Please, no profiling or implementor agreements!  If the
is that there are going to be certain syntaxes that the utf-8 negotiation
not apply to, then let's nail the list down, and include it in the
We do implementor agreements after the fact, because we discover that a
definition isn't sufficient.

Received on Friday, 1 March 2002 11:25:22 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:26:04 UTC