- From: Philip TAYLOR (Ret'd) <P.Taylor@Rhul.Ac.Uk>
- Date: Thu, 05 Feb 2009 16:22:42 +0000
- To: Henri Sivonen <hsivonen@iki.fi>
- CC: Jonathan Kew <jonathan@jfkew.plus.com>, Andrew Cunningham <andrewc@vicnet.net.au>, public-i18n-core@w3.org, W3C Style List <www-style@w3.org>
Henri Sivonen wrote : > You do realize that the language I speak natively isn't invariant under > Unicode normalization when written? I hadn't appreciated that point (you do, after all, write perfect English) but I don't think it is strictly relevant here. >> Yet for many (perhaps most) of the world's languages, comparison by >> code-point is noticeably sub-optimal. > Sure. However, easy equality checking is a more important characteristic > of computer language identifiers than natural language optimality. Surely we now have sufficient processing power available that adopting "easy" solutions is no longer the primary concern. Given that Unicode has the concept of "canonical equivalence", it seems to me that in designing Unicode- based systems we should be setting out to exploit that equivalence, rather than ignoring it. > That identifiers > aren't just binary numbers but have some mnemonic textual interpretation > is just a bonus for convenience. We shouldn't get carried away thinking > that natural language expression is the primary point of having e.g. > HTML ids. No, of course it's not the /primary/ point, but it is a very important point none the less. Suppose, for example, I were Vietnamese, and wanted to differentiate snakes from other reptiles; would it be unreasonable of me to want to be able to write <span class="rắn"> ... </span> wherever a snake occurred in the text, and to have that class match the corresponding CSS rule for ".rắn {}", even if the CSS had been created using a different authoring system that generated a different internal representation for "rắn" ? Philip TAYLOR
Received on Thursday, 5 February 2009 16:23:21 UTC