- From: Frank Ellermann <nobody@xyzzy.claranet.de>
- Date: Thu, 6 Dec 2007 23:46:27 +0100
- To: www-international@w3.org
Richard Ishida wrote: > Please send any comments to www-international@w3.org. | Note however that that byte may represent either é or щ, | depending on the context. ... "represent é, щ, other characters, or no character at all depending on the charset". You'd need a definition of the shorthand "charset" first, maybe your wording "context" allows you to skip this here. My point: There are more possibilities than only é or щ. | Most Web pages use the UTF-8 encoding for Unicode text. ..."pages and Internet protocols use"... RFC 2277 (BCP 18) is the LAW, it even has a decent deadline until UTF-8 will replace all legacy charsets in protocols (not before 2048). Are you sure about "most Web pages" (as of today) ? | some more complicated decoding is needed UTF-8 isn't complicated, it's brilliant. It's just not obvious for most human readers (including some folks who like modulo 16 better than modulo 64). Maybe pick only UTF-16BE as contrast, and omit UTF-32BE, the implicit message should be "there is UTF-8, anything else is doomed" (not before 2048, as noted above). The "how does this affect me" section is rather short, and using s/Unicode/UTF-8/ everywhere does not simplify everything. E.g. I'm sure that most of my fonts can handle windows-1252, and for the reasons noted in your text I'm also sure that they can't handle all of UTF-8. For Web pages the encoding is almost irrelevant, authors can anyway insert any Unicode point as NCR, and in that case picking ASCII or window-1252 can be a valid choice. But whatever authors pick, they MUST declare it, that's the important point, addressed earlier in your text. One of the more tricky issues is text/plain, depending on the platform it might be not possible to declare what it is, and then UTF-8 has some nice properties allowing to guess correctly. But maybe talking about "guessing" would be at odds with the goals of your text (?) Frank
Received on Thursday, 6 December 2007 22:44:56 UTC