- From: Jon Hanna <jon@hackcraft.net>
- Date: Mon, 8 Mar 2004 12:17:13 +0000
- To: Julian Reschke <julian.reschke@gmx.de>
- Cc: Elliotte Rusty Harold <elharo@metalab.unc.edu>, Tim Bray <tbray@textuality.com>, "www-tag@w3.org" <www-tag@w3.org>
Quoting Julian Reschke <julian.reschke@gmx.de>: > As far as I understand, UTF-16 may perform (in terms of size) much > better for asian languages, so it seems that it makes a lot of sense if > protocols can choose UTF-8 vs UTF-16 based on what makes most sense for > the document content. That is correct East-Asian and Indic languages will typically take 50% more octets to encode the text in UTF-8 than in UTF-16. Languages that use the Latin script will take somewhere in the region of 90%-100% more octets to encode the same text in UTF-16 than in UTF-8. Compression tends to act as a leveller here, but not a perfect one. So in the case of very large quantities of text the choice of encoding can have an appreciable impact on download times, and allowing that choice to be made by the transmitter seems sensible. That said, there are other encodings capable of directly encoding the entire Unicode repetoir that are more efficient in terms of stream size, but they are quite complicated to process; which I would argue outweighs the benefits they have for download times. -- Jon Hanna <http://www.hackcraft.net/> "…it has been truly said that hackers have even more words for equipment failures than Yiddish has for obnoxious people." - jargon.txt
Received on Monday, 8 March 2004 07:17:15 UTC