[whatwg] Entity parsing [trema/diĉresis vs umlaut]

On 25 Jun 2007, at 11:44AM, K?i?tof ?elechovski wrote:

> A stressed schwa is present in Polish maritime dialect as well (Kasz?bszczi)
> and Slovaks write "m?so" for "miaso" (meat), but that is not the point.  All
> such uses can be covered under the hood of the dieresis;

I really do not understand why these uses of the double-dot diacritic
should be considered as instances of the di?resis (see below).

> the dieresis is not a double accent

I never said "double accent", but you are right in pointing out that I should have
called it a double-dot diacritic rather than a double-dot accent, since
-- strictly speaking -- the only accents are acute, grave and circumflex.

> To make it explicit and plain: the dieresis is a diacritical mark that has
> no intrinsic phonetic connotation, although it is used mostly for separating
> vowels;

As you may know, di?resis derives from the Greek verb ???????? (diairein),
which means ?to divide?, and it does indeed have an intrinsic meaning.

According to the OED, a di?resis is ?[t]he sign (?) marking [a phonological di?resis], or,
more usually, placed over the second of two vowels which otherwise make a
diphthong or single sound, to indicate that they are to be pronounced separately.?

Similarly, umlaut is defined as ?[t]he diacritical sign (?) placed over a vowel to
indicate that [umlaut] has taken place.?

Hence, the use of either term when the double-dot diacritic is performing
another linguistic function is equally abusive.

> the phonetic meaning of umlaut is generic and well-defined by its
> very name and it does not apply to the vowel I.

Indeed. German umlaut notation is further restricted, and I am not quite sure
if the phonetic phenomenon applies to y either, but this is rather far off topic.

> I did not intend to make HTML support all possible linguistic intricacies;
> I only wanted to eliminate the common nonsense of denoting ? with ï
> [...]
>  I only want the true umlaut to be distinct, not as a code point but as an entity name.
> [...]
> It would be up to the author to determine whether ü or &utrema;
> is appropriate; both entities should denote the same character.

Do you really think it is a good idea to introduce twelve new aliases
that do not work in current browsers, do not make the language more
expressive and require authors to make meaningless decisions?
(Is Slovak ? borrowed from German [it is pronounced ? or ?] and
therefore ä or does it have another origin? Should we use
&atrema; by default? How about Pinyin ?? Swedish words that contain
an ? as a result of umlaut vs those that contain it for a different reason?)

Trema or di?resis might have been a better choice than umlaut as a generic name,
since umlaut does not apply to all Latin vowels, but it is really too late to fix this now.


On 25 Jun 2007, at 11:51AM, K?i?tof ?elechovski wrote:

> Could I have an example of &otrema; please? 

The canonical example in Dutch seems to be co?rdinatie, see
http://nl.wikipedia.org/wiki/Trema_in_de_Nederlandse_spelling .

> Something along the lines of zo?logy, but actually required?

Well, such spellings are "actually required" in some varieties of English.
?The New Yorker mandates that authors must co?perate to re?ducate our
readership.? ? allegedly from the magazine?s style manual.


On 25 Jun 2007, at 11:16AM, K?i?tof ?elechovski wrote:

> there is no language that could make use of this distinction by having both
> ü and &utrema;.  There are languages that use ü and theoretically
> there could be ones that use &utrema;, although I do not know of any valid case
> (I consider the French case invalid).

I have no idea why you consider capharna?m to be invalid (if this is what you imply),
but perhaps Spanish ping?ino and Dutch re?nie will be more convincing examples.

French dictionaries require loan-words like angstr?m, f?hrer and l?nder (plural
of land) to be spelt with an umlaut, but these are of course too rare for
a differentiation tr?ma/umlaut to have developed, and I would imagine
German imports with umlaut to be only slightly more common in Dutch.

It would be interesting to see whether 19th-c. German actually made a
distinction between umlaut on a, o, u and di?resis on e, i (e.g., Rhombo?d),
but I do not know how consistently the di?resis was used, and words
requiring it are typically foreign words that, unlike the rest, will not have
been printed in Fraktur...

-- 
?istein E. Andersen

Received on Monday, 25 June 2007 06:45:39 UTC