- From: Frank Ellermann <nobody@xyzzy.claranet.de>
- Date: Thu, 10 Apr 2008 09:30:00 +0200
- To: www-international@w3.org
Addison Phillips wrote: > I've sent the updated document to Richard for posting tomorrow. Thanks. [text/xml and US-ASCII] > However, the point here is to use UTF-8 and NOT some other > encoding. Character entities are less desirable than real > characters. Okay. Of course it also depends on the platform, the tools, what the user intends to do (read or edit), and what's more user friendly in the case of unsupported characters. Admittedly my preferences are odd (= "all I need to know is the hex. codepoint number for anything above u+00FF"), and arguably RFC 5198 says "not good enough, hex. is not NFC". [about the wonders of "redundant" charset declarations] > Announcing it in the protocol is good because often this > takes precedence (or other Bad Things happen if you don't > set your server to emit the *correct* encoding declaration > ---like it emits the wrong one). The three servers I use at the moment are not "my" servers. One of the them happily says Latin-1 for any text/html, no matter what it really is. As it happens it is either ASCII or windows-1252, never Latin-1. I don't see that the HTTP or HTML5 WGs intend to improve this situation, putting it mildly. The concept "your server" is already broken, it's the same idea as "your UA". [Latin-1 vs. UTF-8] > The exact amount of expansion depends on the language and > particular text involved. Expansions for some common > encodings might be as much as: Right, I was curious if 10% was based on some reproducible results for the languages allegedly covered by Latin-1. > j. UTF-7 reference. Changed to RFC 2152. Then washed hands. Not fair, UTF-7 was an important step in 1994, at that time UTF-8 was apparently still "work in progress". An IETF AD volunteered to sponsor an Internet draft deprecating UTF-7, maybe try it. My attempt failed, I ended up with talking about UTF-32, UTF-16, UTF-8, CESU-8, UTF-1, BOCU-1, "UTF-4", UTF-EBCDIC, UNICODE-1.1, and "UTF-5" - excluding SCSU isn't the trick to get a remotely comprehensible text. Frank
Received on Thursday, 10 April 2008 07:28:07 UTC