- From: Henri Sivonen <hsivonen@iki.fi>
- Date: Thu, 5 Feb 2009 17:52:12 +0200
- To: "Philip TAYLOR (Ret'd)" <P.Taylor@Rhul.Ac.Uk>
- Cc: Jonathan Kew <jonathan@jfkew.plus.com>, Andrew Cunningham <andrewc@vicnet.net.au>, public-i18n-core@w3.org, W3C Style List <www-style@w3.org>
On Feb 5, 2009, at 17:31, Philip TAYLOR (Ret'd) wrote: > Henri Sivonen wrote: > >> My point is that it's generally not helpful to bring out the >> Western bias[1] thing in discussions of using Unicode in computer >> languages. Previously, too, performance has been preferred over >> full natural language complexity for computer language identifier >> equality comparison and in that instance clearly it could not have >> been an issue of Western bias. The thing is that comparing computer >> language identifiers code point for code point is the common-sense >> thing to do. > > With respect, it is the /simplest/ thing to do. For those > who work in anything more complex than English, it is > probably anything /but/ "common sense". You do realize that the language I speak natively isn't invariant under Unicode normalization when written? Yet, I don't insist that e.g. XML parsers to consult the Unicode database when doing string equality matching. (Yeah, the input methods for my native language are pretty consistent in producing NFC, so I don't actively feel the pain of normalization- inconsistent input methods, but still occasionally I need to explain NFD to people--mostly when Mac OS X happens to leak its internals into interchange.) >> If you consider the lack of case-insensitivity, some languages are >> not perfectly convenienced. If you consider the lack normalization, >> another (overlapping) set of languages is not perfectly >> convenienced. If you consider the sensitivity to diacritics, yet >> another set of languages is not perfectly convenienced. No language >> is prohibited by code point for code point comparison, though. > > Yet for many (perhaps most) of the world's languages, comparison by > code-point is noticeably sub-optimal. Sure. However, easy equality checking is a more important characteristic of computer language identifiers than natural language optimality. (The content carried by XML and HTML is a different story.) That identifiers aren't just binary numbers but have some mnemonic textual interpretation is just a bonus for convenience. We shouldn't get carried away thinking that natural language expression is the primary point of having e.g. HTML ids. -- Henri Sivonen hsivonen@iki.fi http://hsivonen.iki.fi/
Received on Thursday, 5 February 2009 15:52:56 UTC