W3C home > Mailing lists > Public > whatwg@whatwg.org > June 2007

[whatwg] Entity parsing [trema/diaeresis vs umlaut]

From: Křištof Želechovski <giecrilj@stegny.2a.pl>
Date: Wed, 27 Jun 2007 21:45:20 +0200
Message-ID: <005001c7b8f3$b1d4de70$1a01080a@POCZTOWIEC>
How does it influence the case flanc&eacutee vs &oeliguvre?  The only
difference is that the first one is used in English.
Chris

-----Original Message-----
From: Oistein E. Andersen [mailto:html5@xn--istein-9xa.com] 
Sent: Tuesday, June 26, 2007 10:55 PM
To: giecrilj at stegny.2a.pl; whatwg at whatwg.org
Subject: Re: [whatwg] Entity parsing [trema/diaeresis vs umlaut]

On 26 Jun 2007, at 7:49AM, K?i?tof ?elechovski wrote:

> Internet Explorer apparently chose to support English natively
> while SGML preferred remaining language-agnostic.

To be fair, this is not how things developed.

Microsoft first chose to make the semicolon optional not only
when allowed by SGML rules (notably before whitespace and tags),
but in any position, for all named entities /that existed at the time/,
i.e., latin-1.

Unfortunately, this meant that new entities could not be added without
changing the interpretation of already existing pages (e.g., if a page
contained "less&less", adding the entity &le to the list would result in its
being interpreted
as "less?ss"), although most of the entities have names that are rather
unlikely to appear by chance, and the ampersand "should" be spelt &amp;.

Microsoft did not dare to risk this, so entities beyond latin-1 require
a semicolon in IE, even in cases where it is optional according
to SGML (and therefore will pass HTML 4.01 validation, I might add).

-- 
Oistein E. Andersen
Received on Wednesday, 27 June 2007 12:45:20 UTC

This archive was generated by hypermail 2.3.1 : Monday, 13 April 2015 23:08:35 UTC