- From: Misha Wolf <MISHA.WOLF@reuters.com>
- Date: Wed, 27 Nov 1996 12:39:28 -0500 (EST)
- To: www-html <www-html@w3.org>, www-international <www-international@w3.org>, Unicode <unicode@unicode.org>
We have three representations: (a) raw octets (b) numeric character references (c) entity names. Numeric character references are, of course, supposed to refer to Unicode/ ISO 10646. The charset, whether specified via HTTP or HTML or a menu, should affect the interpretation of (a). It should *not* affect the interpretation of (b) or (c). The major browsers were broken in this regard and are being gradually fixed. An example of a "cheesy little editor" that created lots of polluted Web pages was FrontPage 1.0. Though Microsoft sold it as suitable only for Code Page 1252, lots of people used it on other Code Pages. FP 1.0 simply exports stuff as if it were CP 1252, hence a Russian Web page ends up full of Latin 1 entity names! FP 2.0 (aka 97) has, I believe, fixed this. The various Internet Assistants did the same foul thing. I hope they've been fixed. The pages created using these tools will presumably (?) get fixed when their authors pass them through the new versions of the tools. Can anyone confirm/deny this? Misha
Received on Wednesday, 27 November 1996 08:02:16 UTC