Re: [CSS21] Case-insensitivity not defined from Martin Duerst on 2007-11-16 (www-international@w3.org from October to December 2007)

From: Martin Duerst <duerst@it.aoyama.ac.jp>
Date: Fri, 16 Nov 2007 14:09:51 +0900
To: fantasai <fantasai.lists@inkedblade.net>, www-international@w3.org
Cc: www-style@w3.org, "'WWW International'" <www-international@w3.org>
Message-Id: <6.0.0.20.2.20071116133955.07b1e1f0@localhost>
I tend to go with what Fantasai says below, and what Anne also
seems to have expressed: The case sensitivity needs in CSS are
very limited. As far as I understand, we have three cases:

- CSS keywords: These are US-ASCII only, and therefore the
  simplest case sensitivity is okay. Actually, 99% or more
  of the stylesheets that I have seen use only lower case,
  so it wouldn't have been a great problem if these had been
  defined case-sensitive originally, but of course it's
  impossible to go back and change things now.

- Identifiers in the markup languages that CSS works with
  (e.g. HTML and XML element and attribute names): Here
  CSS says that case sensitivity depends on the language
  involved. For traditional HTML, element names are case-
  insensitive, therefore CSS treats them as case-insensitive.
  For XML, that's the other way round. The only thing that
  CSS can do here reasonably is to follow whatever the
  target language specifies, both for the basic question of
  case-sensitive or not as well as for the details regarding
  non-ASCII characters, if applicable. I guess it would be
  good that the CSS spec explicitly points out that such
  details may vary depending on the target language.
  Note that the question of case sensitivity isn't simply
  a per-language thing; it's easily possible that there
  are variations within a language. CSS2 already describes
  this at http://www.w3.org/TR/REC-CSS2/syndata.html#q4,
  explaining that ids, classes, and font names are case-
  sensitive even in traditional HTML.

- Identifiers within CSS. These include cases such as
  namespace prefixes and counter names inside CSS.
  Ideally, these should just work case-sensitive; I don't
  think it's asking too much from stylesheet writers to
  use the same case for all occurrences of a specific
  counter name. If that's not possible for legacy reasons
  (e.g. stylesheets that indeed use counter names and
  friends with haphazard casing), then something like
  'case-insensitive for US-ASCII, case sensitive for
  the rest', even though it sounds terribly ugly, may
  be the best solution.

Regards,    Martin.

At 08:45 07/11/16, fantasai wrote:
>
>Addison Phillips wrote:
>> 
>>> I find that the basic Latin letters do match each other and nothing
>>> else, if you ignore the language-specific foldings, with one exception.
>>> U+212A KELVIN SIGN, which looks exactly like "K" and shouldn't exist
>>> anyhow (it's compatibility equivalent to a proper "K") is case-folded
>>> to "k".  I consider that to come under the heading of the Right Thing.
>> Compatibility characters always present a problem of this sort. I think this is also the Right Thing.

Compatibility characters should not be honored by trying to match
them to others. The best thing here is to isolate and quarantine them
so that they die out :-(.

>>> It's also true that some ligatures are case-folded to their spelled out
>>> equivalents:  for example, U+FB00 LATIN SMALL LIGATURE FF is case-folded
>>> to simple "ff".
>> This is actually a Good Thing too.

No, for CSS it definitely would be overkill.

>It's a Good Thing for natural-language matching and search results. It is
>imho not a Good Thing for defining case-insensitivity for keywords in a
>computer language. Since CSS keywords are all limited to the ASCII range,
>it should be possible to reliably match against CSS keywords with only
>ASCII case-insensitivity. Throwing in random other characters into the mix
>can cause confusion and possibly also result in security holes. I believe
>the potential problems in that respect outweigh the convenience of
>case-insensitivity for non-Latin user-defined identifiers.

Two little remarks here:
- There are not too many non-Latin scripts that have cases. These
  are usually simpler than Latin itself, because they don't have
  issues such as the Turkish/Azery I/i. So this is a non-ASCII,
  but very much Latin script, issue.
- Case insensitivity is a user convenience mostly in cases where
  case conventions are not well established, and where users are
  often guessing identifiers, or have to remember them for repeated
  use. The examples we are really dealing with, such as counter
  names, are very local, and aren't used on a regular basis by
  plain end users. For such cases, the 'convenience' issue is of
  much lower importance.


Regards,    Martin.



#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst@it.aoyama.ac.jp
Received on Friday, 16 November 2007 05:39:46 UTC