Date: Fri, 22 Aug 1997 11:31:01 +0200 (MET DST) From: =?iso-8859-1?Q?Martin_J=2E_D=FCrst?= <email@example.com> To: "E. Stephen Mack" <firstname.lastname@example.org> cc: email@example.com In-Reply-To: <firstname.lastname@example.org> Message-ID: <Pine.SUN.3.96.970822110816.703O-100000@enoshima> Subject: Re: Disturbing IE 4.0pp2 behavior for lang="en" On Fri, 22 Aug 1997, E. Stephen Mack wrote: > I was testing the new HTML 4.0 entities in IE 4.0 pp2. > Here's a fragment of a test document: > > <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0//EN"> > <HTML LANG="EN"> > <HEAD> > <meta http-equiv="Content-Type" value="text/html; charset='UTF-8'"> ^^^^^ > <TITLE>Background Color Name Test</TITLE> > </HEAD> > <BODY> > √ > β > </BODY> > </HTML> First, get your syntax right. Instead of "value" above, please write "content". > First, IE 4.0 does not automatically pick the UTF-8 font to > display the entities. Instead, the viewer must manually pick > the font using View | Font | Universal Alphabet (UTF-8). It may be that this is a one-time setting. In NN e.g. you can choose what font you want to use for JIS encoding, and so on. IE probably has something similar. It may not be possible to initialize this automatically. > Second, the presence of the LANG="EN" attribute overrides the > manual font choice. IE 4.0 pp2 will refuse to display the entities > using the UTF font, unless the LANG attribute is removed. That is extremely strange. Strictly speaking, there is no MUST rule that all characters must be displayed, and indeed display details can be made dependent on the LANG attribute (for example, a different font could be used for French than for English, in the same way different fonts are used in paperbacks,...). > It's my understanding that specifying a language with the LANG > attribute should *NOT* have an influence over the character > encoding that is picked to display the document, is that right? Definitely. The character encoding is architecturally independent. There are strong correlations between languages and character encodings, but these correlations should express themselves only in the statistical distribution of authored documents available on the net, and should not have any influence on display. Also, √ and β are absolutely independent of character encoding (charset). You can put them in a pure ASCII document (what in fact you have done), you can put them in an iso-2022-jp document, an EBCDIC document, whatever. They always refer to the same characters, and a browser should make a strong effort to display them. From IE and NN, I would expect to change fonts for such characters on the fly if they are not in the currently used fonts. Also, the fact that a language is labeled should not reduce the characters displayed. Otherwise, we will have difficulties getting people to even start to language-label their documents. There may be some software around (I have once seen such an effect in WordPerfect, but that is a long time ago) that tries to say: Beta can't possibly be English, so we won't display it. This is very faulty. A single beta can very well be part of an English text, e.g. about some math subject. Trying to come up with a set of allowed characters for each language that would be consistent accross the net is really impossible. > If so, I will forward this as a bug to Microsoft. I think it's > important that authors be able to follow HTML 4.0's recommended > practice of specifying the language with the LANG attribute without > losing any entity support. Please first try with "content" instead of "value". Regards, Martin.