Re: [CSS21][css3-namespace][css3-page][css3-selectors][css3-content] Unicode Normalization from Henri Sivonen on 2009-02-06 (www-style@w3.org from February 2009)

From: Henri Sivonen <hsivonen@iki.fi>
Date: Fri, 6 Feb 2009 11:36:08 +0200
To: Richard Ishida <ishida@w3.org>
Cc: <public-i18n-core@w3.org>, "'W3C Style List'" <www-style@w3.org>
Message-Id: <4AD97101-83A2-4637-A36E-D3F06E8B4003@iki.fi>

On Feb 5, 2009, at 19:23, Richard Ishida wrote:

> Henri, I think that if we follow your argument we should expect to  
> see far more ids such as id1, id2, id3, or aa, ab, ac... etc.  But  
> actually people tend to regularly use id and class names that make  
> some sense, are easy to remember, and relate to the topic at hand.

No, we shouldn't expect to see just numbers if the ability to use  
identifiers with a mnemonic textual interpretation is available.  
However, space-separated tokens don't allow e.g. English compound  
nouns to be used with their correct written forms, so we should expect  
English compound nouns used as ids to be adapted to the relevant  
constraits by using e.g. hyphens, underscores or camelCasing.

> Well, if you speak and think in excellent English there's no big  
> deal with codepoint for codepoint comparison.  But if you speak and  
> think in Vietnamese, Burmese, Khmer, Tamil, Malayalam, Kannada,  
> Telugu, Sinhala, Tlįchǫ Yatìi, Dënesųłįne, Dene Zhatié– 
> Shihgot’ine, Gwich’in, Dɛnɛsųłįnɛ, Igbo, Yoruba, Arabic,  
> Urdu, Azeri, Tibetan, Japanese, Chinese, Russian, Serbian, etc. etc.  
> and especially if your content is in that language, then it wouldn't  
> be so surprising that you would want to write class names and ids in  
> that language too, and I think we need to investigate what is needed  
> to support that.

Using class names or ids made of words in those languages is enabled.  
It's just that inconsistent defects in text input software may lead to  
surprises in some cases. However, to get rid of the surprises, the  
text input methods should be fixed instead of complicating other  
software.

If the various key strokes that can produce ü on European text input  
methods produced different code point sequences, it would be rightly  
considered a defect in the input methods. On the other hand, very  
complex input cooking between keystrokes and the code points  
communicated to an application accepting text input already exist:  
consider cooking many Romaji keystrokes into one Kanji.

If input methods for the languages you mention are inconsistent in  
their ordering of combining marks where the ordering of marks is  
visually indistinguishable, that's a defect of those input methods.  
Cooking the input to yield canonically ordered output should be a  
minor feat considering the infrastructure that already exists for e.g.  
Japanese text input methods and the visual rendering integration that  
e.g. Mac OS X does when you type the umlaut and the u separately for  
ü. The right place to fix is the input methods--not software further  
down the chain processing text. After all, text is written fewer times  
than it is read. Furthermore, if you count GUI environments that  
handle text input, the number of systems where the fix needs to be  
applied is relatively small--just like the number of browser engines  
is relatively small, which is often used as an argument for putting  
complexity into browser engines.

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/

Received on Friday, 6 February 2009 09:36:52 UTC