- From: Charles McCathieNevile <charles@w3.org>
- Date: Sat, 1 May 2004 07:23:58 -0400 (EDT)
- To: public-esw-thes@w3.org
On Sat, 1 May 2004, Leonard Will wrote: > >In message <20040501102437.GB20642@homer.w3.org> on Sat, 1 May 2004, Dan >Brickley <danbri@w3.org> wrote >> >>Regarding this specific proposal, it might be worth thinking about >>how "an alphabetical list of terms" works in an internationalised >>context. Do all scripts (eg. Japanese kanji, kana?) define a sort order? > >Surely they must if they have dictionaries, catalogues and telephone >directories. I'm sure there will be standards for this, but I haven't >looked them up. In general languages define sorting order(s). For example, where accented latin characters appear depends on the language (Icelandic and Vietnamese share some distinctive characters but sort them differently, as do Greenlandic and the Yolngu-Matha group of australian languages). There are several ways of sorting Japanese, based on Kanji features or on kana transliterations, depending for example on what the use case is. These strike me as very implementation-dependent details. As Dan said in the thread, we probably should not be specifying this normatively (although we can note it in best practices and such). In some cases a new implementation will want to create a new ordering approach (such as mixing japanese with french by using a romaji version of the japanese term as the sorting key, to pick an example from the top of my head). In other cases there will be no really good way of doing things - transliteration of arabic into latin characters is so widely variant that terms may appear twice in different transliterations, to help find them (Osama, Ousama and Usama are all common transliterations of the same arabic name, to pick a random example). Mostly one would expect people developing a system for a language to know how to sort in that language, or to make the effort of finding out. cheers Chaals >>How about datasets that mix Japanese with English (multilingual >>thesauri, etc.)? > >Leonard > >P.S. As I assume that we are all on the <public-esw-thes@w3.org> >discussion list, I think it is best not to send duplicate copies to >private email addresses, which just need to be checked to see whether >they are really duplicates and then deleted. This is true, but for various reasons it tends to be a pain and people forget to do it...
Received on Saturday, 1 May 2004 07:24:37 UTC