Re: [css3-text; css3-fonts] Uppercasing ß 2011 edition from Christoph Päper on 2011-11-09 (www-style@w3.org from November 2011)

From: Christoph Päper <christoph.paeper@crissov.de>
Date: Wed, 9 Nov 2011 14:07:00 +0100
To: W3C Style <www-style@w3.org>
Message-Id: <4749C40D-EC21-46D0-B61E-141FF47B1573@crissov.de>

Tim Tepaße:
> 
> toupper("ß") = SS
> toupper("ß") = SZ  // somewhat more exotic

‘ß’ → ‘SS’ and ‘ß’ → ‘SZ’ are orthographic transformations, if you consider ‘ß’ a letter – like 21th century German orthography and some of 20th century legislation does. 
One of them, usually the former, is a typographic transformation, if you consider ‘ß’ a ligature – like 20th century German orthography and some earlier ones did.

Note that the inverse, i.e. * → ‘ß’, is never orthographically required, but there may be typographic instances thereof.

> … code point U+1E9E for a capital sharp S was encoded in Unicode 5.1. It's a code point which is not a target of toupper().

‘ß’ → ‘ẞ’ is a typographic transformation, if you consider ‘ß’ a letter. 
It is nonsense, if you consider ‘ß’ a ligature.

For roman base letters, orthographic or graphemic (i.e. language-dependent) and typographic or graphetic (i.e. language-agnostic) transforms are usually identical. The intended mechanism and domain of the CSS property ‘text-transform’ isn’t clearly defined, yet, and the Unicode waters are often a bit muddy, too. Judging the property by its name, since it contains “text”, one would expect language dependency, otherwise it should be called ‘character-transform’. (Hypothetic ‘glyph-transform’ or ‘glyph-select’ is yet another beast, usually dealt with on the (smart) font level, but also being made accessible in CSS via ‘font-variant-alternates’.)

Basic typographic transformations are usually easier to implement for developers, but orthographic transformations are often desired by users. Even if you took into account linguistic features for ‘text-transform’, you couldn’t reasonably stop at the lexemic, but would rather have to move on to the syntactic level, especially for titlecase where some words should stay lowercase at least under certain circumstances.

In conclusion, although it does not have the perfect name for the task, ‘text-transform’ already exists and should do the simple things for now. That means it should ignore language information. Since ligatures should not be encoded as single characters, ‘ß’ should be treated as a letter. Therefore it should always transform to ‘ẞ’, despite what Unicode may say. 

CSS 3 and later levels may and should add at least two other properties that control glyph selection on the one hand and language-dependent rules on the other. Instead of a separate property, advanced linguistic knowledge could be required only for new values to the ‘text-transform’ property. Text transformation may also affect glyph selection, although it’s not there for fine control over it.

Received on Wednesday, 9 November 2011 13:07:40 UTC