Re: Disturbing IE 4.0pp2 behavior for lang="en"

Chris Wendt (christw@microsoft.com)
Fri, 22 Aug 1997 20:19:49 -0700


From: "Chris Wendt" <christw@microsoft.com>
To: "Misha Wolf" <misha.wolf@reuters.com>
Cc: "www-html" <www-html@w3.org>
Date: Fri, 22 Aug 1997 20:19:49 -0700
Message-ID: <01bcaf73$6a2c4fc0$8b53369d@christw3.dns.microsoft.com>
Subject: Re: Disturbing IE 4.0pp2 behavior for lang="en"

This would indeed be disturbing. Fortunately I can not reproduce this
behavior in current builds.

To understand better what is happening:

IE4 does indeed interpret the LANG attribute to do Han disambiguation ==
choose a Korean, Chinese (Traditional), Chinese (Simplified) or Japanese
font for a Chinese character. We may have done a bit too much of it in PP2,
affecting even non-Han characters and even languages other than zh/ja/ko.
Certainly the LANG never ever changes the encoding of the document.

For the effect you saw when switching between UTF-8 and iso-8859-1: this
action - among other things - changes the default font applied to the
document. If the respective default font claims in it's font signature that
it supports the script of the character in question, it uses the default
font regardless of the fact whether the font _really_ has the glyph for the
character.

So I appreciate if you in a possible bug report against entities also
mention the default font that you had assigned to each script (under
View.Options.Fonts). The TrueType specification allows to claim support of a
script without providing every single glyph of that script. The only way to
work around this is to use a font you know contains the glyph and it's
signature tells the truth. Most commercial fonts I have seen so far tell the
truth and contain most of the glyphs of the scripts they claim to support.

For most of the named entities it is advisable to have the standard fonts
that you get in the Windows 95 Multilanguage support or the IE3/IE4
Pan-European language pack. Pan-European support is by default contained in
Windows NT.

Chris..

-----Original Message-----
From: Misha Wolf <misha.wolf@reuters.com>
To: Chris Wendt <christw@microsoft.com>
Cc: www-html <www-html@w3.org>
Date: Friday, August 22, 1997 3:23 AM
Subject: Re: Disturbing IE 4.0pp2 behavior for lang="en"


>Chris,
>
>The following is very worrying.
>
>Misha
>
>Stephen Mack wrote:
>>
>> Here is a minimal (corrected!) example.
>>
>> First, run IE 4.0 pp2.  Select the UTF font by using the
>> View | Fonts | Universal Alphabet (UTF-8) command.
>>
>> Then view this HTML document:
>>
>> <HTML>
>> <meta http-equiv="Content-Type" content="text/html; charset='UTF-8'">
>> <TITLE>Entities</TITLE>
>> <BODY>
>> <P>
>> &trade; &radic; &beta;
>> <P LANG="EN">
>> &trade; &radic; &beta;
>> </BODY>
>> </HTML>
>>
>> The only difference between these two paragraphs is that the
>> second specifies a language with the LANG attribute.  The
>> first paragraph will display correctly, but only the trademark
>> symbol is displayed in the second paragraph.  (The other two
>> entities are displayed with a hollow box.)
>>
>> The bug also occurs if the LANG is specified in the
>> <HTML> tag or the <BODY> tag.
>> --
>> E. Stephen Mack <estephen@emf.net>    http://www.emf.net/~estephen/
>
>------------------------------------------------------------------------
>Any views expressed in this message are those of the individual  sender,
>except  where  the  sender  specifically  states them to be the views of
>Reuters Ltd.