- From: David Woolley <david@djwhome.demon.co.uk>
- Date: Mon, 5 Jun 2006 23:08:41 +0100 (BST)
- To: www-html@w3.org
> What has been in IE has been there for years...when the computing world > was based on code pages and system locales instead of Unicode. Actually, > that has only been some 5-7 years ago. HTML wasn't. It's internal code page was ISO 8859/1 and that was also the default code page for HTTP. The problem was: 1) browsers actually treated both as being the recipient's platform's code page, so you got totally bogus entities, like š, because browsers actually used Windows-1252. 2) ISO 8859/1 is USA and Western European chavinistic, so there was no way for people in the rest of the world to create valid web pages - even specifying gb2312 in the HTTP header didn't remove the fact that you couldn't represent Chinese in the HTML internal character set (one result was that people actually used two numeric entities to represent one character!). HTML 4 extends to ISO 10646 and makes specifying the transfer character set a SHOULD (or is it a MUST), but browsers still have to cope with legacy pages. Character set, though, is rather technical for ordinary users, but using UTF-8 for everything bloats pages, although that is the default for true XHTML (not the Appendix C stuff that started this thread). So, the original situation was that there was an explicit default, but it was inadequate, and the current situation is that character set should always be specified. > Based on users needing to view pages and an ability to control the > quality of pages that a page author may generate, the best solution for > customers is help them view the page...even if the author or tool did > not put in the character set used. I thought this was supposed to be one of the main reasons why the vast majority of HTML is bad. Authors author for the intended result on the current version of IE, not to the standards. > used (hopefully defaulting to UTF-8) and then to educate authors who are > generating content to check that their pages are written correctly. All attempts to educate people to even use validator.w3.org have essentially failed. It is generally only amateurs who produce valid HTML.
Received on Monday, 5 June 2006 22:13:02 UTC