Re: labels and case sensitivity from Ross Wetmore on 2000-10-05 (ietf-dav-versioning@w3.org from October to December 2000)

From: Ross Wetmore <rwetmore@verticalsky.com>
Date: Thu, 05 Oct 2000 09:56:11 -0400
To: infonuovo@email.com
Cc: "Geoffrey M. Clemm" <geoffrey.clemm@rational.com>, ietf-dav-versioning@w3.org
Message-ID: <39DC887B.38D3014D@verticalsky.com>
Sorry for the delay in response, practical realities can take precedence.

Most of these issues were dealt with in the early nineties by widespread
industry groups such as the Unicode Consortium (www.unicode.org) or
XOpen standards such as XPG4 in 1992 (www.opengroup.org).

I find it difficult to believe the rules for case-insensitive comparison
are that difficult to grasp, though I admit the implementations for the
old 8-bit character representations had to deal with the fundamental
ambiguities of context dependent mappings and can be pretty hairy
reading. The solution for such implementation problems of course is to
agree on the context or to remove the ambiguities in the character set
representations. The latter means either restricting use to the POSIX
subset of all locales, or using one of the "universal" sets like Unicode.

Instead of 8-bit string functions such as strcoll, you might want to look 
at the WCS equivalents in most standard libraries. Issues such as no upper
or lower case form are smoothly handled by towupper(), towlower(). If
you are concerned about equivalent forms (which is stretching things for
case insensitivity), try wcscoll(). Most of the major programming 
environments, Java, Windows, and commercial Unix variants should have all 
the implementation tools needed. Many have a function such as wcsicmp(), 
or String.equalsIgnoreCase().

I believe the current draft mandates UTF8-Unicode as the format for
exchanging character data. Further specification of the implementation
details in the standard is probably not in keeping with the tradition of
such things and should be left to the implementors, right?

Cheers,
RossW

"Dennis E. Hamilton" wrote:
> 
> Oh oh,
> 
> I dunno about case insensitivity.
> 
> With regard to what locale?
> 
> Who has a fixed case insensitivity rule that works the same across all
> locales in Unicode?
> 
> This is only slightly less problematic than dealing with locale-preferred
> collating sequences.
> 
> Case insensitive comparisons do not always work when one party is using the
> C locale (in which we think we know what case insensitivity means and which
> we tend to think is the universal locale for people's data but isn't) and
> another locale/language that has some letters that have no capital form,
> others that have no lower-case form, and none of them are in the [extended?]
> "Roman" alphabet of the typical 26 A to Z.
> 
> I am responding out of context, but I wanted to offer the warning that
> unless you are shoe-horned into ISO 646, this is not as obvious as most
> people think.
> 
> I think if you dig through Plauger's text on the Standard C (or C++)
> string.h and ctypes.h library, you will see what there is to be concerned
> about.  By the way, this problem is of such interesting complexity that
> there is no caseless string compare in the Standard C library.  The
> non-standard ones are generally not well-specified and reading the actual
> code in certain widely-used library implementations is a real eye-opener.
> 
> Unless you are prepared to rigorously specify the rules for case-insensitive
> comparison, you are creating a problem, not solving one.
> 
> -- Dennis
> 
> AIIM DMware Technical Coordinator
> http://www.infonuovo.com/dmware
> ------------------
> Dennis E. Hamilton
> InfoNuovo
> mailto:infonuovo@email.com
> tel. +1-206-779-9430 (gsm)
> fax. +1-425-793-0283
> http://www.infonuovo.com
> 
> -----Original Message-----
> From: ietf-dav-versioning-request@w3.org
> [mailto:ietf-dav-versioning-request@w3.org]On Behalf Of Ross Wetmore
> Sent: Monday, October 02, 2000 14:11
> To: Geoffrey M. Clemm
> Cc: ietf-dav-versioning@w3.org
> Subject: Re: labels and case sensitivity
> 
> It might be more reasonable to change the wording to "should"
> for case preservation, and make it a "must" that all label
> comparisons are done case insensitive. There is no need to know
> anything about server properties to operate under these conditions.
> 
> The argument for case preservation is generally one of usability or
> legibility in that CapitalizationCanHelpDistinguishContext. But
> there are enough cases where case does not exist or is lost, as when
> a user has to type in a very long string, that requiring it is too
> strong a condition.
> 
> Case insensitive comparisons will always work without any further
> special handling and are the flip side of minimizing requirements
> but being flexible in accomodating different perspectives. The only
> significant argument against this is reduction in namespace size.
> 
> Cheers,
> RossW
> 
> "Geoffrey M. Clemm" wrote:
> >
> > Currently, we require that labels be case preserving and
> > that label comparison be case sensitive.
> > Chris Kaler would like to have it be server-defined whether
> > labels are case sensitive.
> >
> > Comments?
> >
> > Cheers,
> > Geoff
> >
> > (I'll save my comments for a followup, so as not to confuse
> > Chris' request with my comments).
Received on Thursday, 5 October 2000 09:55:56 UTC