RE: labels and case sensitivity from Dennis E. Hamilton on 2000-10-05 (ietf-dav-versioning@w3.org from October to December 2000)

From: Dennis E. Hamilton <infonuovo@email.com>
Date: Thu, 5 Oct 2000 07:54:50 -0700
To: "Ross Wetmore" <rwetmore@verticalsky.com>
Cc: "Geoffrey M. Clemm" <geoffrey.clemm@rational.com>, <ietf-dav-versioning@w3.org>
Message-ID: <NDBBKEGCNONMNKGDINPFAENNEMAA.infonuovo@email.com>
There's a misunderstanding here.

There are no "rules for case-insensitive comparison" that are implemented in
common among all implementations of standard libraries.  Having
non-standardized functions of the same name and signature is not solving
this problem.  Apparently, each of those implementers felt that the rules
weren't too difficult to grasp and were indeed self-evident.  The
consequence, in the current state of affairs, is that two WebDAV servers, or
a server and a client, coming up with the same case-insensitive compare
result is at best a coincidence or the happy consequence of having used the
same platform and development software.  (I found *different*
implementations in a comparison of Visual C++ 6.0 libraries, though.  This
is a wonderful  example of the value of leaving it to the implementation
absent a specification of the rules that are followed.)

I agree that having a single character set, in the fashion of Unicode, cuts
down the variables to be dealt with.  However, the rules for
case-insensitive comparison can still vary by locale, and because of
ambiguities in Unicode this does not seem to be resolved.

Please site an unambiguous specification of a Unicode caseless compare (is
there even one in Java, the house of run-the-same-everywhere?), and agree on
that as a requirement for DeltaV "case-insensitive comparison."   Then I
agree that you are home free.  Failing that, all that is being created is
people getting caught by interoperability problems in the wild, at a point
where it will be very difficult to resolve the problems.  There is no reason
to have these problem at all.

Here's the challenge.  Find a place where the rules are actually laid out,
rather than described in principal.  Shouldn't be hard yes?  Or even find an
implementation that you are willing to reverse-engineer the rules from.  I
say it is necessary to be specific.  Why not.  It is easy, yes?

-- Dennis

AIIM DMware Technical Coordinator
http://www.infonuovo.com/dmware
------------------
Dennis E. Hamilton
InfoNuovo
mailto:infonuovo@email.com
tel. +1-206-779-9430 (gsm)
fax. +1-425-793-0283
http://www.infonuovo.com

-----Original Message-----
From: Ross Wetmore [mailto:rwetmore@verticalsky.com]
Sent: Thursday, October 05, 2000 06:56
To: infonuovo@email.com
Cc: Geoffrey M. Clemm; ietf-dav-versioning@w3.org
Subject: Re: labels and case sensitivity


Sorry for the delay in response, practical realities can take precedence.

Most of these issues were dealt with in the early nineties by widespread
industry groups such as the Unicode Consortium (www.unicode.org) or
XOpen standards such as XPG4 in 1992 (www.opengroup.org).

I find it difficult to believe the rules for case-insensitive comparison
are that difficult to grasp, though I admit the implementations for the
old 8-bit character representations had to deal with the fundamental
ambiguities of context dependent mappings and can be pretty hairy
reading. The solution for such implementation problems of course is to
agree on the context or to remove the ambiguities in the character set
representations. The latter means either restricting use to the POSIX
subset of all locales, or using one of the "universal" sets like Unicode.

Instead of 8-bit string functions such as strcoll, you might want to look
at the WCS equivalents in most standard libraries. Issues such as no upper
or lower case form are smoothly handled by towupper(), towlower(). If
you are concerned about equivalent forms (which is stretching things for
case insensitivity), try wcscoll(). Most of the major programming
environments, Java, Windows, and commercial Unix variants should have all
the implementation tools needed. Many have a function such as wcsicmp(),
or String.equalsIgnoreCase().

I believe the current draft mandates UTF8-Unicode as the format for
exchanging character data. Further specification of the implementation
details in the standard is probably not in keeping with the tradition of
such things and should be left to the implementors, right?

Cheers,
RossW

"Dennis E. Hamilton" wrote:
>
> Oh oh,
>
> I dunno about case insensitivity.
>
> With regard to what locale?
>
> Who has a fixed case insensitivity rule that works the same across all
> locales in Unicode?
>
> This is only slightly less problematic than dealing with locale-preferred
> collating sequences.
>
> Case insensitive comparisons do not always work when one party is using
the
> C locale (in which we think we know what case insensitivity means and
which
> we tend to think is the universal locale for people's data but isn't) and
> another locale/language that has some letters that have no capital form,
> others that have no lower-case form, and none of them are in the
[extended?]
> "Roman" alphabet of the typical 26 A to Z.
>
> I am responding out of context, but I wanted to offer the warning that
> unless you are shoe-horned into ISO 646, this is not as obvious as most
> people think.
>
> I think if you dig through Plauger's text on the Standard C (or C++)
> string.h and ctypes.h library, you will see what there is to be concerned
> about.  By the way, this problem is of such interesting complexity that
> there is no caseless string compare in the Standard C library.  The
> non-standard ones are generally not well-specified and reading the actual
> code in certain widely-used library implementations is a real eye-opener.
>
> Unless you are prepared to rigorously specify the rules for
case-insensitive
> comparison, you are creating a problem, not solving one.
>
> -- Dennis
>
> AIIM DMware Technical Coordinator
> http://www.infonuovo.com/dmware
> ------------------
> Dennis E. Hamilton
> InfoNuovo
> mailto:infonuovo@email.com
> tel. +1-206-779-9430 (gsm)
> fax. +1-425-793-0283
> http://www.infonuovo.com
>
> -----Original Message-----
> From: ietf-dav-versioning-request@w3.org
> [mailto:ietf-dav-versioning-request@w3.org]On Behalf Of Ross Wetmore
> Sent: Monday, October 02, 2000 14:11
> To: Geoffrey M. Clemm
> Cc: ietf-dav-versioning@w3.org
> Subject: Re: labels and case sensitivity
>
> It might be more reasonable to change the wording to "should"
> for case preservation, and make it a "must" that all label
> comparisons are done case insensitive. There is no need to know
> anything about server properties to operate under these conditions.
>
> The argument for case preservation is generally one of usability or
> legibility in that CapitalizationCanHelpDistinguishContext. But
> there are enough cases where case does not exist or is lost, as when
> a user has to type in a very long string, that requiring it is too
> strong a condition.
>
> Case insensitive comparisons will always work without any further
> special handling and are the flip side of minimizing requirements
> but being flexible in accomodating different perspectives. The only
> significant argument against this is reduction in namespace size.
>
> Cheers,
> RossW
>
> "Geoffrey M. Clemm" wrote:
> >
> > Currently, we require that labels be case preserving and
> > that label comparison be case sensitive.
> > Chris Kaler would like to have it be server-defined whether
> > labels are case sensitive.
> >
> > Comments?
> >
> > Cheers,
> > Geoff
> >
> > (I'll save my comments for a followup, so as not to confuse
> > Chris' request with my comments).
Received on Thursday, 5 October 2000 10:52:12 UTC