- From: McDonald, Ira <imcdonald@sharplabs.com>
- Date: Wed, 30 Mar 2005 08:07:10 -0800
- To: "'Chris Lilley'" <chris@w3.org>, Deborah Cawkwell <deborah.cawkwell@bbc.co.uk>
- Cc: www-international@w3.org
Hi, And for what it's worth, the IETF formally requires that UTF-8 must be supported in transferring human-readable text over any Internet protocol (including HTTP/1.1) and has done so for a _long_ time. See RFC 2277 (January 1998) which specifically prohibits (for example) UTF-16 only support (without UTF-8). If you encode a page in UTF-16, there's a fair chance that an intermediary is going to convert it into UTF-8 before delivery anyway. The "benefits" of UTF-16 disappeared after Plane 0 stopped being the only useful and assigned Unicode codepoints (for example, all the interesting math and musical notation is not in Plane 0). Cheers, - Ira Ira McDonald (Musician / Software Architect) Blue Roof Music / High North Inc PO Box 221 Grand Marais, MI 49839 phone: +1-906-494-2434 email: imcdonald@sharplabs.com -----Original Message----- From: www-international-request@w3.org [mailto:www-international-request@w3.org]On Behalf Of Chris Lilley Sent: Wednesday, March 30, 2005 9:29 AM To: Deborah Cawkwell Cc: www-international@w3.org Subject: Re: Unicode encoding for web pages On Wednesday, March 30, 2005, 2:45:27 PM, Deborah wrote: DC> For web pages, would you consider using a Unicode encoding DC> other than UTF-8, eg UTF-16? If so, why? or why not? I used to consider that UTF-16 would provide a space saving benefit for those languages where a single character runs to three or four bytes in UTF-8. It turns out that if there is a fairly small amount of markup, this space saving is not seen in practive. I understand that in well optimised Web Services applications withhigh throughput, profiling shows that UTF-8 to UTF-16 conversion (eg, to construct a DOM) can become significant so one would imaging shipping content in UTF-16 might help there also. I could not see any particular reason to use UTF-7. Material where a) random access was a high priority and b) there was significant usage of characters that would require surrogates, might indicate that using UCS-4 would be a benefit. So in general, and particularly for XML where a parser is not required to understand encodings other than UTF-8 and UTF-16, I see less and less reason to use anything other than UTF-8. -- Chris Lilley mailto:chris@w3.org Chair, W3C SVG Working Group W3C Graphics Activity Lead
Received on Wednesday, 30 March 2005 16:07:31 UTC