[Prev][Next][Index][Thread]

Re: LANG= for character-mapping



To echo what others have said:

"language" and "character encoding" are orthogonal. In HTML, in
addition, there are two levels: "character encoding" (as specified as
the MIME charset parameter) and "document character set" (as specified
in HTML's SGML declaration.)

> Now suppose the HTTP Charset-header is set to some Russian character-
> encoding (Ms. codepage 1251, KOI-8R or ISO 8859-5 -- you may pick your
> choice). What happens to entities like é an ö? Browsers like
> Navigator, Explorer and Mosaic will map them blindly to #233 and #246.
> And so they'll appear as arbitrary Russian characters.

The "document character set" is fixed, static, constant for HTML.
é and ö _should_ always map to the same characters, (e
with acute accent, o with umlaut) independent of what the charset
encoding might be. If these implementations actually have that
behavior, they're buggy and you should report the bugs.

I believe this is all expanded in:

 http://ds.internic.net/internet-drafts/draft-ietf-html-i18n-04.txt

which is now in IESG 'last call'.

Larry


References: