- From: <bbauma1@cs.umbc.edu>
- Date: Mon, 16 Sep 1996 07:17:19 +0000
- To: w3c-sgml-wg@w3.org
Michael Sperberg-McQueen writes: >Is this compromise workable? The short answer: Yes The long answer: Thanks for the summary of the discussion so far, although I would characterize my position more as "One Hundred Flowers are going to bloom, how are we going to deal with it.", and you propose a good solution. Once concern I still have is the role of UTF-8 as both the normalized form, and possibly the only form for network transmission. UTF-8 is biased toward the efficient encoding of some languages at the expense of others. Although I understand the pragmatic reasons that one would want to specify UTF-8 as the standard, I am still uncomfortable with this bias. Using UTF-16 is one solution to this problem. One other possible solution is an encoding scheme from Reuters that was presented at the UNICODE conference a couple of weeks ago. I was not actually at the conference and don't have the paper in front of me, so Gavin please correct me if I get this wrong. Basically the Reuters scheme uses a sliding window across the UNICODE character space 256 characters wide, with the window's starting point set by a shift key followed by a couple of bytes to specify the offset. By default, the window starts at the beginning of the UNICODE character space and thus ASCII is represented unchanged. This encoding also allows any other alphabetic language to be represented efficiently. It is not quite so efficient with languages that do require multiple bytes such as Chinese, Korean etc., but as the paper points out each character in these languages carries quite a bit more information then a single alphabetic character. The main problem with this encoding is that its not a standard. B. Todd Bauman Graduate Student University of Maryland, Baltimore County
Received on Monday, 16 September 1996 07:26:19 UTC