- From: E. Stephen Mack <estephen@emf.net>
- Date: Fri, 22 Aug 1997 03:59:13 -0700
- To: www-html@w3.org
At 11:36 AM 8/22/97 +0000, Misha Wolf <misha.wolf@reuters.com> wrote: >Could you retry without the "'" characters and tell us what happens? Yes. In fact, if you use just: <TITLE>Entities</TITLE> <BODY> <P> ™ √ β <P LANG="EN"> ™ √ β ...that's all you need to show that the bug is present (as long as you are using the UTF-8 font setting; if you use Western as the font, then both paragraphs are rendered equally incorrectly). (It's also irrelevant to IE 4 if the <!DOCTYPE> declaration is present or not, but to make the above HTML strictly legal, it should refer to the HTML 4.0 DTD with a DOCTYPE in order to capture the entities.) The simple fix will be for Microsoft to stop making the LANG="EN" attribute imply a switch to the Western font. The more complex -- but better -- fix will be for Microsoft to switch fonts as necessary (even in-line) to display the entities requested by the document. As Martin J. Dürst pointed out, if I'm referring to Greek symbols and the radical sign for a math lesson, it doesn't matter what character encoding or font or language the document says is being used -- the requested characters are named entities which are universal, and they *must* be displayed by the browser if possible. Since IE clearly *can* display the characters, for it not to do so is clearly a violation of the HTML 4.0 specification (and perverse besides). I should point out again that Navigator 4.02 is even worse than IE, since it displays the literal characters ™ √ β without making any attempt to display these valid HTML 4.0 named entities. IE 3.02 and Navigator 3.02 are off the hook since they are under no obligation to display entities from a version of HTML that they precede. (IE 3.02 does recognize the ™ entity anyway.) * * * Side issue -- given the hugeness of the Unicode character set and the convenience of named entities, I predict that future versions of HTML may add new named entities. Perhaps the HTML 4.0 spec could add a section telling user agents how to treat unrecognized named entities. If HTML 5.0 introduces the &foo; named entity, how should a pre-HTML 5.0 browser treat &foo; if it sees it? Displaying the literal sequence "&foo;" strikes me as a less-than optimal solution. (Perhaps a universal fallback mechanism to access a central W3C standard library of entities, with images of the correct glyphs if no font is available?) -- E. Stephen Mack <estephen@emf.net> http://www.emf.net/~estephen/
Received on Friday, 22 August 1997 06:58:50 UTC