[whatwg] Entity parsing [trema/diæresis vs umlaut] from Øistein E. Andersen on 2007-06-23 (public-whatwg-archive@w3.org from June 2007)

From: Øistein E. Andersen <html5@xn--istein-9xa.com>
Date: Sat, 23 Jun 2007 23:27:44 +0200
Message-ID: <E1I2D8y-000AII-C4@node1-3.ouvaton.local>

Sander wrote:

> Are there any char-sets that have both umlaut and trema variations of characters?

Unicode does not make the distinction, so this is somewhat unlikely.

(Personally, I tend to think that the apparent preference for umlaut dots closer
to the letter than trema dots can be linked to extrinsic phenomena like the
preference for steep accents in French typography.)

Kristof Zelechovski wrote:

> Only the vowel U can have either

This is not quite right. All Latin vowels (a, e, i, o, u, y) can take the trema/di?resis
(?, ?, ?, ?, ? in Dutch; ?, ?, ?*, ?** in French), and a, o, u can all be umlauted (?, ?, ?
in German).

Moreover, the double-dot accent also has other uses (e.g., ? and ? both designate
a stressed schwa in Luxembourgeois), so it is probably not advisable
to attempt a complete classification in HTML.

-- 
?istein E. Andersen

*) possibly only in the word capharna?m (disregarding the highly unpopular
rectifications orthographiques of 1990) and in proper names
**) only in proper names

Received on Saturday, 23 June 2007 14:27:44 UTC