- From: Larry Masinter <masinter@parc.xerox.com>
- Date: Tue, 23 Jul 1996 21:00:09 PDT
- To: Albert-Lunde@nwu.edu
- CC: MOURIK@rullet.leidenuniv.nl, www-international@w3.org
To echo what others have said: "language" and "character encoding" are orthogonal. In HTML, in addition, there are two levels: "character encoding" (as specified as the MIME charset parameter) and "document character set" (as specified in HTML's SGML declaration.) > Now suppose the HTTP Charset-header is set to some Russian character- > encoding (Ms. codepage 1251, KOI-8R or ISO 8859-5 -- you may pick your > choice). What happens to entities like é an ö? Browsers like > Navigator, Explorer and Mosaic will map them blindly to #233 and #246. > And so they'll appear as arbitrary Russian characters. The "document character set" is fixed, static, constant for HTML. é and ö _should_ always map to the same characters, (e with acute accent, o with umlaut) independent of what the charset encoding might be. If these implementations actually have that behavior, they're buggy and you should report the bugs. I believe this is all expanded in: http://ds.internic.net/internet-drafts/draft-ietf-html-i18n-04.txt which is now in IESG 'last call'. Larry
Received on Wednesday, 24 July 1996 00:00:50 UTC