Re: Disturbing IE 4.0pp2 behavior for lang="en"

E. Stephen Mack (estephen@emf.net)
Fri, 22 Aug 1997 03:59:13 -0700


Message-Id: <3.0.3.32.19970822035913.00fe8914@emf.net>
Date: Fri, 22 Aug 1997 03:59:13 -0700
To: www-html@w3.org
From: "E. Stephen Mack" <estephen@emf.net>
In-Reply-To: <B0000614447@ritig6.rit.reuters.com>
Subject: Re: Disturbing IE 4.0pp2 behavior for lang="en"

At 11:36 AM 8/22/97 +0000, Misha Wolf <misha.wolf@reuters.com> wrote:
>Could you retry without the "'" characters and tell us what happens?

Yes.  In fact, if you use just:

<TITLE>Entities</TITLE>
<BODY>
<P>
&trade; &radic; &beta;
<P LANG=3D"EN">
&trade; &radic; &beta;

...that's all you need to show that the bug is present (as long
as you are using the UTF-8 font setting; if you use Western as
the font, then both paragraphs are rendered equally incorrectly).

(It's also irrelevant to IE 4 if the <!DOCTYPE> declaration is
present or not, but to make the above HTML strictly legal, it
should refer to the HTML 4.0 DTD with a DOCTYPE in order to
capture the entities.)

The simple fix will be for Microsoft to stop making the LANG=3D"EN"
attribute imply a switch to the Western font.

The more complex -- but better -- fix will be for Microsoft to switch
fonts as necessary (even in-line) to display the entities requested
by the document.

As Martin J. D=FCrst pointed out, if I'm referring to Greek symbols
and the radical sign for a math lesson, it doesn't matter what
character encoding or font or language the document says is being used
-- the requested characters are named entities which are universal,
and they *must* be displayed by the browser if possible.  Since IE
clearly *can* display the characters, for it not to do so is
clearly a violation of the HTML 4.0 specification (and perverse
besides).

I should point out again that Navigator 4.02 is even worse than IE,
since it displays the literal characters &trade; &radic; &beta;
without making any attempt to display these valid HTML 4.0=20
named entities.

IE 3.02 and Navigator 3.02 are off the hook since they are
under no obligation to display entities from a version of HTML
that they precede.  (IE 3.02 does recognize the &trade; entity
anyway.)

                          * * *

Side issue -- given the hugeness of the Unicode character set
and the convenience of named entities, I predict that future
versions of HTML may add new named entities.  Perhaps the
HTML 4.0 spec could add a section telling user agents how to
treat unrecognized named entities.

If HTML 5.0 introduces the &foo; named entity, how should a
pre-HTML 5.0 browser treat &foo; if it sees it?  Displaying
the literal sequence "&foo;" strikes me as a less-than optimal
solution.

(Perhaps a universal fallback mechanism to access a central W3C
standard library of entities, with images of the correct glyphs
if no font is available?)
--=20
E. Stephen Mack <estephen@emf.net>    http://www.emf.net/~estephen/