Re: Disturbing IE 4.0pp2 behavior for lang="en" from Martin J. Dürst on 1997-08-22 (www-html@w3.org from August 1997)

From: Martin J. Dürst <mduerst@ifi.unizh.ch>
Date: Fri, 22 Aug 1997 11:31:01 +0200 (MET DST)
To: "E. Stephen Mack" <estephen@emf.net>
cc: www-html@w3.org
Message-ID: <Pine.SUN.3.96.970822110816.703O-100000@enoshima>

On Fri, 22 Aug 1997, E. Stephen Mack wrote:

> I was testing the new HTML 4.0 entities in IE 4.0 pp2.
> Here's a fragment of a test document:
> 
> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0//EN">
> <HTML LANG="EN">
> <HEAD>
> <meta http-equiv="Content-Type" value="text/html; charset='UTF-8'">
                                  ^^^^^
> <TITLE>Background Color Name Test</TITLE>
> </HEAD>
> <BODY>
> &radic;
> &beta;
> </BODY>
> </HTML>

First, get your syntax right. Instead of "value" above, please
write "content".

> First, IE 4.0 does not automatically pick the UTF-8 font to
> display the entities.  Instead, the viewer must manually pick
> the font using View | Font | Universal Alphabet (UTF-8).

It may be that this is a one-time setting. In NN e.g. you can
choose what font you want to use for JIS encoding, and so on.
IE probably has something similar.
It may not be possible to initialize this automatically.

> Second, the presence of the LANG="EN" attribute overrides the
> manual font choice.  IE 4.0 pp2 will refuse to display the entities
> using the UTF font, unless the LANG attribute is removed.

That is extremely strange. Strictly speaking, there is no
MUST rule that all characters must be displayed, and indeed
display details can be made dependent on the LANG attribute
(for example, a different font could be used for French than
for English, in the same way different fonts are used in
paperbacks,...).

> It's my understanding that specifying a language with the LANG
> attribute should *NOT* have an influence over the character
> encoding that is picked to display the document, is that right?

Definitely. The character encoding is architecturally independent.
There are strong correlations between languages and character
encodings, but these correlations should express themselves
only in the statistical distribution of authored documents
available on the net, and should not have any influence on
display.

Also, &radic; and &beta; are absolutely independent of character
encoding (charset). You can put them in a pure ASCII document
(what in fact you have done), you can put them in an iso-2022-jp
document, an EBCDIC document, whatever. They always refer to
the same characters, and a browser should make a strong effort
to display them. From IE and NN, I would expect to change
fonts for such characters on the fly if they are not in
the currently used fonts.

Also, the fact that a language is labeled should not reduce
the characters displayed. Otherwise, we will have difficulties
getting people to even start to language-label their documents.

There may be some software around (I have once seen such an effect
in WordPerfect, but that is a long time ago) that tries to say:
Beta can't possibly be English, so we won't display it. This
is very faulty. A single beta can very well be part of an
English text, e.g. about some math subject. Trying to come
up with a set of allowed characters for each language that
would be consistent accross the net is really impossible.

> If so, I will forward this as a bug to Microsoft.  I think it's
> important that authors be able to follow HTML 4.0's recommended
> practice of specifying the language with the LANG attribute without
> losing any entity support.

Please first try with "content" instead of "value".

Regards,	Martin.

Received on Friday, 22 August 1997 05:30:55 UTC