- From: Babich, Alan <ABabich@filenet.com>
- Date: Thu, 4 May 2000 11:39:14 -0700
- To: www-webdav-dasl@w3.org
Reference them in the DASL spec. There is really no other choice. Alan Babich -----Original Message----- From: Jim Whitehead [mailto:ejw@ics.uci.edu] Sent: Wednesday, May 03, 2000 4:56 PM To: infonuovo@email.com; www-webdav-dasl@w3.org; duerst@w3.org Subject: RE: JW24a (i18n sort ordering) - Unicode 3.0 Hmm, just had a brief brainstorm. Since it's extremely unlikely that DASL WG are the first people ever to encounter this issue, I decided to do a little more sleuthing on the Web. Invoking the awesome powers of Google, I ran across a reference to ISO/IEC 14651, which apparently addresses internationalized sort orderings. The discussion is in: http://www.stri.is/TC304/EOR/report.html. This led me to the ISO/IEC working group developing 14651: http://anubis.dkuug.dk/JTC1/SC22/WG20/ Officially known as "JTC1/SC22/WG20 - Internationalization" (say that 10 times fast :-) This page has a link to the current working draft, which looks like it might be exactly what we need (well, except for the fast that it doesn't appear to be approved yet). Quoting from the Introduction: This International Standard defines: - A reference comparison method applicable to two characters strings in order to determine their respective order in a sorted list. The method can be applied on strings exploiting the full repertoire of ISO/IEC 10646-1. This method is also applicable to subsets of that repertoire, such as, for example, those of the different ISO/IEC 8-bit standard character sets, or any other character set, standardized or private, to produce ordering results valid (after tailoring) for a given set of languages for each script. This method uses transformation tables derived either from the Common Template Table defined in this International Standard or from one of its tailorings. - A reference format, using the Backus-Naur Form (BNF) to describe the Common Template Table used normatively in this International Standard. - A specific Common Template Table used by the reference comparison method. This table describes a basic order for all characters encoded in the first edition of ISO/IEC 10646-1 up to Amendment 7. It allows for a further specification of a fully deterministic ordering. The table is a starting point for enabling the specification of an international string ordering adapted to different cultures, without requiring an implementor to have a knowledge of all the different scripts already encoded in the UCS. NOTE 1: This Common Template Table may be modified with a minimum of effort to suit the needs of a local environment. The main benefit, worldwide, is that for other scripts, no modification should be required and that the order will remain as consistent as possible and predictable from an international point of view. NOTE 2: The character repertoire described in this International Standard is equivalent to that of the Unicode Standard Version 2.1. I took a quick look through this document, and it has the nice quality that it is intended to be normative, and deals with all of ISO 10646. Since it isn't approved yet, we probably shouldn't make following it a MUST requirement, but it'll go a long way just to point out that this resource is available. In my Web searches, I also ran across: Unicode Collation Algorithm, Unicode TR#10 http://www.unicode.org/unicode/reports/tr10/ This TR has the nice quality that it has been "Approved", though it is not considered to be part of Unicode 3.0. According to the Unicode site http://www.unicode.org/unicode/reports/: "APPROVED: A technical report that is approved, but not considered part of the Unicode Standard, Version 3.0, must be separately referenced if it is cited. Approved technical reports can be normative. This means that implementations can claim conformance to them. At the current time, the specifications in approved technical reports are provided as information and guidance to implementers of the Unicode Standard, but do not form part of the Standard itself. The Unicode Technical Committee may decide to incorporate all or part of the material of such technical reports into a future version of the Unicode Standard, either as informative or as normative specification. " So, it appears we have two resources to draw upon for sort ordering. One question that still remains is whether we should normatively reference either, or just make them recommended reading when implementors start running into these problems. Thoughts? - Jim
Received on Thursday, 4 May 2000 14:42:11 UTC