- From: Henrik Dahl <hdahl@inet.uni2.dk>
- Date: Fri, 1 Mar 2002 17:25:42 +0100
- To: <www-zig@w3.org>
Hello! I've got a question in scope of the characterset consideration which some of you have regarding MARC records. In danMARC2 the value of a subfield containing the title may look like this: "The <alphabetizationcharacter>ugly duckling" In this way we know that the "The ugly duckling" should be e.g. sorted as "ugly duckling". The <alphabetizationcharacter> is a certain byte which has just been chosen for this prosperous role. Isn't it correct, that conversion to some general characterset as pure UTF-8, UNICODE or the like will simply force the <alphabetizationcharacter> to go away, as such a character just doesn't exist in neither UTF-8 nor in UNICODE, making "The ugly duckling" simply e.g. sorted as "The ugly duckling", which no librarian in e.g. Denmark will accept. Best regards, Henrik Dahl -----Oprindelig meddelelse----- Fra: www-zig-request@w3.org [mailto:www-zig-request@w3.org]Pa vegne af Ray Denenberg Sendt: Friday, March 01, 2002 5:01 PM Til: www-zig@w3.org Cc: zig Emne: Re: native encoding "LeVan,Ralph" wrote: > We are going to have to profile (through an implementors agreement) which > record syntaxes the UTF-8 negotiation applies to. No, no, no! Please, no profiling or implementor agreements! If the sentiment is that there are going to be certain syntaxes that the utf-8 negotiation will not apply to, then let's nail the list down, and include it in the definition. We do implementor agreements after the fact, because we discover that a definition isn't sufficient. --Ray
Received on Friday, 1 March 2002 11:25:22 UTC