Re: [CSS21][css3-namespace][css3-page][css3-selectors][css3-content] Unicode Normalization from Philip TAYLOR (Ret'd) on 2009-02-05 (www-style@w3.org from February 2009)

From: Philip TAYLOR (Ret'd) <P.Taylor@Rhul.Ac.Uk>
Date: Thu, 05 Feb 2009 16:22:42 +0000
To: Henri Sivonen <hsivonen@iki.fi>
CC: Jonathan Kew <jonathan@jfkew.plus.com>, Andrew Cunningham <andrewc@vicnet.net.au>, public-i18n-core@w3.org, W3C Style List <www-style@w3.org>
Message-ID: <498B1252.2070302@Rhul.Ac.Uk>

Henri Sivonen wrote :

> You do realize that the language I speak natively isn't invariant under 
> Unicode normalization when written? 

I hadn't appreciated that point (you do, after all,
write perfect English) but I don't think it is
strictly relevant here.

>> Yet for many (perhaps most) of the world's languages, comparison by 
>> code-point is noticeably sub-optimal.

> Sure. However, easy equality checking is a more important characteristic 
> of computer language identifiers than natural language optimality. 

Surely we now have sufficient processing power available
that adopting "easy" solutions is no longer the primary
concern.  Given that Unicode has the concept of "canonical
equivalence", it seems to me that in designing Unicode-
based systems we should be setting out to exploit that
equivalence, rather than ignoring it.

> That identifiers 
> aren't just binary numbers but have some mnemonic textual interpretation 
> is just a bonus for convenience. We shouldn't get carried away thinking 
> that natural language expression is the primary point of having e.g. 
> HTML ids.

No, of course it's not the /primary/ point, but it is a very
important point none the less.  Suppose, for example, I were
Vietnamese, and wanted to differentiate snakes from
other reptiles; would it be unreasonable of me to want to
be able to write <span class="rắn"> ... </span> wherever
a snake occurred in the text, and to have that class match
the corresponding CSS rule for ".rắn {}", even if the CSS
had been created using a different authoring system that
generated a different internal representation for "rắn" ?

Philip TAYLOR

Received on Thursday, 5 February 2009 16:23:21 UTC