Re: URI policy for thesaurus concepts

On Sat, 1 May 2004, Leonard Will wrote:

>
>In message <20040501102437.GB20642@homer.w3.org> on Sat, 1 May 2004, Dan
>Brickley <danbri@w3.org> wrote
>>
>>Regarding this specific proposal, it might be worth thinking about
>>how "an alphabetical list of terms" works in an internationalised
>>context. Do all scripts (eg. Japanese kanji, kana?) define a sort order?
>
>Surely they must if they have dictionaries, catalogues and telephone
>directories. I'm sure there will be standards for this, but I haven't
>looked them up.

In general languages define sorting order(s). For example, where accented
latin characters appear depends on the language (Icelandic and Vietnamese
share some distinctive characters but sort them differently, as do
Greenlandic and the Yolngu-Matha group of australian languages). There are
several ways of sorting Japanese, based on Kanji features or on kana
transliterations, depending for example on what the use case is.

These strike me as very implementation-dependent details. As Dan said in the
thread, we probably should not be specifying this normatively (although we
can note it in best practices and such). In some cases a new implementation
will want to create a new ordering approach (such as mixing japanese with
french by using a romaji version of the japanese term as the sorting key, to
pick an example from the top of my head). In other cases there will be no
really good way of doing things - transliteration of arabic into latin
characters is so widely variant that terms may appear twice in different
transliterations, to help find them (Osama, Ousama and Usama are all common
transliterations of the same arabic name, to pick a random example). Mostly
one would expect people developing a system for a language to know how to
sort in that language, or to make the effort of finding out.

cheers

Chaals

>>How about datasets that mix Japanese with English (multilingual
>>thesauri, etc.)?
>
>Leonard
>
>P.S. As I assume that we are all on the <public-esw-thes@w3.org>
>discussion list, I think it is best not to send duplicate copies to
>private email addresses, which just need to be checked to see whether
>they are really duplicates and then deleted.

This is true, but for various reasons it tends to be a pain and people forget
to do it...

Received on Saturday, 1 May 2004 07:24:37 UTC