W3C home > Mailing lists > Public > www-webdav-dasl@w3.org > April to June 2000

RE: JW24a (i18n sort ordering) - Unicode 3.0

From: Jim Whitehead <ejw@ics.uci.edu>
Date: Wed, 3 May 2000 16:55:36 -0700
To: infonuovo@email.com, www-webdav-dasl@w3.org, duerst@w3.org
Message-ID: <NDBBIKLAGLCOPGKGADOJGEMCDCAA.ejw@ics.uci.edu>

Dennis Hamilton writes:
> The Unicode 3.0 specification does address sort orders more.

This is good to hear.

> I am not that confident in implementers as you!  (I am sitting here in
> Italy watching multi-language issues show up left and right as I
> research some connectivity problems using the Internet and various
> European -- the problem should be understood here, yes? -- customer
> support numbers, etc.)

OK, so I've been thinking about this problem a little more, and it still
seems really hard.

Let's look at one problem:

Assume you have a set of resources, a third of which are in English, a third
of which are in Italian, and the final third are in Japanese.  The titles of
these are stored as WebDAV properties, in the native language and character
set. Now assume you do a DASL query to retrieve the titles of some subset,
with the results returned in ascending order.  Further assume that the
result set contains some English, German, and Japanese resources.

What would be a reasonable response for a DASL server?  One answer is to
just return them grouped by language, returning first the Italian ones, then
all the English ones, then all the Japanese ones. But, even here this isn't
sufficient.  What if several of the resources are about the same subject?
In this case, the user would prefer to have them ordered by subject (for
example, all resources about Rome should be grouped together, whether Rome
is spelled "Rome" or "Roma"). So, it might make sense to sort all languages
expressable using ISO-Latin-1 characters (I think these correspond to the
first 255 characters in Unicode) together, but then list all other resources
by language.

However, I suspect there are probably other language groups that share the
same alphabet, but I have no idea what they are.

Anyone have any other thoughts on this?

- Jim
Received on Wednesday, 3 May 2000 19:57:36 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Sunday, 22 March 2009 03:38:05 GMT