W3C home > Mailing lists > Public > whatwg@whatwg.org > June 2007

[whatwg] Entity parsing [trema/diaeresis vs umlaut]

From: Řistein E. Andersen <html5@xn--istein-9xa.com>
Date: Tue, 26 Jun 2007 22:55:20 +0200
Message-ID: <E1I3I4G-000JYA-9H@node1-2.ouvaton.local>
On 26 Jun 2007, at 7:49AM, K?i?tof ?elechovski wrote:

> Internet Explorer apparently chose to support English natively
> while SGML preferred remaining language-agnostic.

To be fair, this is not how things developed.

Microsoft first chose to make the semicolon optional not only
when allowed by SGML rules (notably before whitespace and tags),
but in any position, for all named entities /that existed at the time/,
i.e., latin-1.

Unfortunately, this meant that new entities could not be added without
changing the interpretation of already existing pages (e.g., if a page contained ?less&less?, adding the entity &le to the list would result in its being interpreted
as ?less?ss?), although most of the entities have names that are rather
unlikely to appear by chance, and the ampersand ?should? be spelt &amp;.

Microsoft did not dare to risk this, so entities beyond latin-1 require
a semicolon in IE, even in cases where it is optional according
to SGML (and therefore will pass HTML 4.01 validation, I might add).

-- 
?istein E. Andersen
Received on Tuesday, 26 June 2007 13:55:20 UTC

This archive was generated by hypermail 2.4.0 : Wednesday, 22 January 2020 16:58:56 UTC