- From: Brian Smith <brian@briansmith.org>
- Date: Sat, 26 Jan 2008 19:36:15 -0800
- To: "'Henri Sivonen'" <hsivonen@iki.fi>, "'Frank Ellermann'" <hmdmhdfmhdjmzdtjmzdtzktdkztdjz@gmail.com>
- Cc: <public-html-comments@w3.org>
Henri Sivonen wrote: > It is possible, but I think that developing such encodings is > the wrong thing to do. UTF-8 can express all Unicode > characters, so new encodings will be incompatible with > existing software with no improvements in Unicode expressiveness. UTF-8 is only efficient for European languages. For non-European languages, BOCU and SCSU offer a significant savings: http://unicode.org/notes/tn6/tn6-1.html. UTF-8's design forces the people of the world with the least money to use the most network bandwidth and storage space. > >> However, encoding proliferation is a problem. If BOCU and/or SCSU were more widely supported, then legacy encodings like TIS-620 (Thai encoded as single bytes) could reasonably fade away. > Developers are free to waste their time on encodings when > they do things in the RAM space of their own applications. > Communications on the public Web affect other people, so > developers who implement pointless stuff waste the time of > other developers as well when they need to interoperate with > the pointlessness. Encodings that offer savings over UTF-8 are not a waste of time. > > But for some scripts and applications UTF-32 could be more straight > > forward than UTF-16. > > In some cases UTF-32 might be preferable in RAM. UTF-32 is > never preferable as an encoding for transferring over the > network. HTML5 encoded as UTF-8 is *always* more compact than > the same document encoded as UTF-16 or UTF-32 regardless of > the script of the content. UTF-8 is significantly less compact than SCSU/BOCU for most peoples' native languages. - Brian
Received on Sunday, 27 January 2008 03:36:27 UTC