RE: JW18: case sensitivity

Alan Babich writes:
> I don't believe the DASL spec. is the right place to define or describe
> what must be described somewhere else about case sensitivity
> in non Latin character sets. DASL just relies on the underlying
> stuff to do the right thing.

I just delved into this issue a little bit, and I have some good news.

It turns out that the Unicode 2.0 standard (see
<http://www.unicode.org/unicode/uni2book/u2.html> for some overview
description) does define the case of characters (in particular, this
information appears in an encoded form in
<ftp://ftp.unicode.org/Public/UNIDATA/UnicodeData.txt> and
<ftp://ftp.unicode.org/Public/UNIDATA/SpecialCasing.txt>).  So, there is a
good normative definition of case.

Even more valuable, there is an approved Unicode technical report titled,
"Unicode Collation Algorithm", available at
<http://www.unicode.org/unicode/reports/tr10/>, that describes "how to
compare two Unicode strings, using the data in a Collation Element Table,
and consistent with the linguistic requirements of The Unicode Standard,
Version 3.0, Section 5.XX Sorting and Searching. (Readers should be familiar
with that section before proceeding.)"  While Unicode version 3.0 doesn't
appear to be available yet, I do note that Unicode version 2.0 (The Unicode
Standard, Version 2.0, is published by Addison-Wesley, 1996, ISBN
0-201-48345-9) does have a section 5.15 titled "Sorting and Searching".
Hmmm, seems like that could be useful in addressing our i18n concerns.

The technical report goes into excrutiating detail on how to perform
normalization, comparison, and sort ordering for Unicode.  Since these are
all open issues for DASL, I think this document could be something we
normatively reference to address these issues.  But, it will require someone
to read through them and determine whether there are any options that need
to be expressable in a DASL query.

I can do some work on this, but I'd appreciate some assistance.

- Jim

Received on Tuesday, 20 July 1999 17:48:05 UTC