RE: Results of CSS case-sensitivity discussion at TPAC from Phillips, Addison on 2012-12-04 (www-international@w3.org from October to December 2012)

From: Phillips, Addison <addison@lab126.com>
Date: Tue, 4 Dec 2012 10:43:39 -0800
To: Anne van Kesteren <annevk@annevk.nl>
CC: "Tab Atkins Jr." <jackalmage@gmail.com>, WWW International <www-international@w3.org>
Message-ID: <131F80DEA635F044946897AFDA9AC34773A8E60EE6@EX-SEA31-D.ant.amazon.com>

Anne van Kesteren wrote:
> 
> I think this is overkill. The whole platform apart from JavaScript uses ASCII
> case-insensitivity, largely for historical reasons (code-point-for-code-point
> would be preferable). There's no reason to invoke complex algorithms to
> compare CSS identifiers as long as they are not necessitated elsewhere.

"The whole platform" is an overstatement. HTML5 carefully avoids the issues: ASCII case insensitivity is used extensively where the namespace is already limited to (ASCII-only) identifiers. Where non-ASCII tokens can appear, you invariably find that HTML5 specifies "case-sensitive" comparison (which I may as well point out is also a "normalization sensitive" comparison, code point by code point).

The "complex algorithms" you mention might actually be easier to implement than you think, though. Most case-insensitive comparison functions in standard libraries are actually internationalized and already do the approximately right thing. A quick survey of browsers on my desktop computer using the following page shows that IE9, Opera, Safari, and Chrome are already non-ASCII case-insensitive (only FF seems to be ASCII-only case-insensitive):

   http://www.inter-locale.com/test/css-case-sensitive-test.html

So a case could even be made that non-ASCII caseless comparison is actually "what browsers do".
> 
> I can see how this recommendation makes sense from an i18n perspective, but
> if you look at the platform holistically it does not seem needed and would in
> fact make the platform more inconsistent and less predictable.
> 
"More inconsistent and less predictable" from what point of view? Unless you have been specifically told to stick to ASCII only identifiers "because that's what works", the most natural thing for most content authors would be to use meaningful-to-them names. What should work for these (relatively less programming oriented) folks? 

Past practice in CSS has been to use ASCII-only identifiers for a variety of historical reasons (not least of which was a lack for character encoding support), which contributes to the sense that ASCII-only case-insensitivity is a "compatibility" dodge for historical reasons. So I think that the question has to be whether case insensitivity is a feature or not.

If case-insensitivity is not a feature but is merely an historical artifact, then ASCII-only insensitivity makes some kind of sense, albeit one that is equally inconvenient to implement (if you have to write a special function to implement "caseless comparison" anyway and then build tests to ensure that it works correctly and performantly). But it still makes for crummy authoring experience. You have to keep in mind all the time the difference between ASCII keys on your keyboard and "the other keys". And software generated pages can be harder to manage too.

If, by contrast, case-insensitivity is a feature, then probably the most responsible way to specify it would be to use Unicode case folding and that's the position that the Internationalization WG decided to take.

Addison

Addison Phillips
Globalization Architect (Lab126)
Chair (W3C I18N WG)

Internationalization is not a feature.
It is an architecture.

Received on Tuesday, 4 December 2012 18:44:26 UTC