W3C home > Mailing lists > Public > www-international@w3.org > July to September 1996

Re: LANG= for character-mapping

From: Larry Masinter <masinter@parc.xerox.com>
Date: Tue, 23 Jul 1996 21:00:09 PDT
To: Albert-Lunde@nwu.edu
CC: MOURIK@rullet.leidenuniv.nl, www-international@w3.org
Message-Id: <96Jul23.210009pdt."2733"@golden.parc.xerox.com>
To echo what others have said:

"language" and "character encoding" are orthogonal. In HTML, in
addition, there are two levels: "character encoding" (as specified as
the MIME charset parameter) and "document character set" (as specified
in HTML's SGML declaration.)

> Now suppose the HTTP Charset-header is set to some Russian character-
> encoding (Ms. codepage 1251, KOI-8R or ISO 8859-5 -- you may pick your
> choice). What happens to entities like &eacute; an &ouml;? Browsers like
> Navigator, Explorer and Mosaic will map them blindly to #233 and #246.
> And so they'll appear as arbitrary Russian characters.

The "document character set" is fixed, static, constant for HTML.
&eacute; and &ouml; _should_ always map to the same characters, (e
with acute accent, o with umlaut) independent of what the charset
encoding might be. If these implementations actually have that
behavior, they're buggy and you should report the bugs.

I believe this is all expanded in:

 http://ds.internic.net/internet-drafts/draft-ietf-html-i18n-04.txt

which is now in IESG 'last call'.

Larry
Received on Wednesday, 24 July 1996 00:00:50 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:16:45 GMT