[whatwg] Entity parsing [trema/diaresis vs umlaut]

Of course you are right; I was thinking of the tr?ma when I wrote that and I
changed it to a dieresis afterwards to make it more English (to get rid of
the red underlines).  A general qui pro quo followed.
Slovak ? is an original invention; the tr?ma palatalizes the preceding
consonant.  I did not consider capharna?m invalid but irrelevant: it is a
Hebrew (or Aramaic?) proper name and can be regarded as a transcription.
Thanks
Chris

-----Original Message-----
From: whatwg-bounces@lists.whatwg.org
[mailto:whatwg-bounces at lists.whatwg.org] On Behalf Of Oistein E. Andersen
Sent: Monday, June 25, 2007 3:46 PM
To: whatwg at whatwg.org; giecrilj at stegny.2a.pl
Subject: Re: [whatwg] Entity parsing [trema/diaresis vs umlaut]

On 25 Jun 2007, at 11:44AM, K?i?tof ?elechovski wrote:

> To make it explicit and plain: the dieresis is a diacritical mark that has
> no intrinsic phonetic connotation, although it is used mostly for
separating
> vowels;

As you may know, diaresis derives from the Greek verb ???????? (diairein),
which means "to divide", and it does indeed have an intrinsic meaning.

According to the OED, a diaresis is "[t]he sign (?) marking [a phonological
diaresis], or,
more usually, placed over the second of two vowels which otherwise make a
diphthong or single sound, to indicate that they are to be pronounced
separately."

Similarly, umlaut is defined as "[t]he diacritical sign (?) placed over a
vowel to
indicate that [umlaut] has taken place."

Hence, the use of either term when the double-dot diacritic is performing
another linguistic function is equally abusive.

> the phonetic meaning of umlaut is generic and well-defined by its
> very name and it does not apply to the vowel I.

Indeed. German umlaut notation is further restricted, and I am not quite
sure
if the phonetic phenomenon applies to y either, but this is rather far off
topic.

> I did not intend to make HTML support all possible linguistic intricacies;
> I only wanted to eliminate the common nonsense of denoting i with ï
> [...]
>  I only want the true umlaut to be distinct, not as a code point but as an
entity name.
> [...]
> It would be up to the author to determine whether ü or &utrema;
> is appropriate; both entities should denote the same character.

Do you really think it is a good idea to introduce twelve new aliases
that do not work in current browsers, do not make the language more
expressive and require authors to make meaningless decisions?
(Is Slovak ? borrowed from German [it is pronounced a or ?] and
therefore ä or does it have another origin? Should we use
&atrema; by default? How about Pinyin ?? Swedish words that contain
an ? as a result of umlaut vs those that contain it for a different reason?)

Trema or diaresis might have been a better choice than umlaut as a generic
name,
since umlaut does not apply to all Latin vowels, but it is really too late
to fix this now.


On 25 Jun 2007, at 11:51AM, K?i?tof ?elechovski wrote:

> Could I have an example of &otrema; please? 

The canonical example in Dutch seems to be co?rdinatie, see
http://nl.wikipedia.org/wiki/Trema_in_de_Nederlandse_spelling .

> Something along the lines of zo?logy, but actually required?

Well, such spellings are "actually required" in some varieties of English.
"The New Yorker mandates that authors must co?perate to re?ducate our
readership." - allegedly from the magazine's style manual.


On 25 Jun 2007, at 11:16AM, K?i?tof ?elechovski wrote:

> there is no language that could make use of this distinction by having
both
> ü and &utrema;.  There are languages that use ü and
theoretically
> there could be ones that use &utrema;, although I do not know of any valid
case
> (I consider the French case invalid).

I have no idea why you consider capharna?m to be invalid (if this is what
you imply),
but perhaps Spanish ping?ino and Dutch re?nie will be more convincing
examples.

-- 
Oistein E. Andersen

Received on Monday, 25 June 2007 23:46:44 UTC