Re: Disturbing IE 4.0pp2 behavior for lang="en" from E. Stephen Mack on 1997-08-22 (www-html@w3.org from August 1997)

From: E. Stephen Mack <estephen@emf.net>
Date: Fri, 22 Aug 1997 03:59:13 -0700
To: www-html@w3.org
Message-Id: <3.0.3.32.19970822035913.00fe8914@emf.net>

At 11:36 AM 8/22/97 +0000, Misha Wolf <misha.wolf@reuters.com> wrote:
>Could you retry without the "'" characters and tell us what happens?

Yes.  In fact, if you use just:

<TITLE>Entities</TITLE>
<BODY>
<P>
&trade; &radic; &beta;
<P LANG="EN">
&trade; &radic; &beta;

...that's all you need to show that the bug is present (as long
as you are using the UTF-8 font setting; if you use Western as
the font, then both paragraphs are rendered equally incorrectly).

(It's also irrelevant to IE 4 if the <!DOCTYPE> declaration is
present or not, but to make the above HTML strictly legal, it
should refer to the HTML 4.0 DTD with a DOCTYPE in order to
capture the entities.)

The simple fix will be for Microsoft to stop making the LANG="EN"
attribute imply a switch to the Western font.

The more complex -- but better -- fix will be for Microsoft to switch
fonts as necessary (even in-line) to display the entities requested
by the document.

As Martin J. Dürst pointed out, if I'm referring to Greek symbols
and the radical sign for a math lesson, it doesn't matter what
character encoding or font or language the document says is being used
-- the requested characters are named entities which are universal,
and they *must* be displayed by the browser if possible.  Since IE
clearly *can* display the characters, for it not to do so is
clearly a violation of the HTML 4.0 specification (and perverse
besides).

I should point out again that Navigator 4.02 is even worse than IE,
since it displays the literal characters &trade; &radic; &beta;
without making any attempt to display these valid HTML 4.0 
named entities.

IE 3.02 and Navigator 3.02 are off the hook since they are
under no obligation to display entities from a version of HTML
that they precede.  (IE 3.02 does recognize the &trade; entity
anyway.)

                          * * *

Side issue -- given the hugeness of the Unicode character set
and the convenience of named entities, I predict that future
versions of HTML may add new named entities.  Perhaps the
HTML 4.0 spec could add a section telling user agents how to
treat unrecognized named entities.

If HTML 5.0 introduces the &foo; named entity, how should a
pre-HTML 5.0 browser treat &foo; if it sees it?  Displaying
the literal sequence "&foo;" strikes me as a less-than optimal
solution.

(Perhaps a universal fallback mechanism to access a central W3C
standard library of entities, with images of the correct glyphs
if no font is available?)
-- 
E. Stephen Mack <estephen@emf.net>    http://www.emf.net/~estephen/

Received on Friday, 22 August 1997 06:58:50 UTC