[css3-text] @text-transform from Christoph Päper on 2011-12-03 (www-style@w3.org from December 2011)

From: Christoph Päper <christoph.paeper@crissov.de>
Date: Sat, 03 Dec 2011 12:43:25 +0100
To: www-style list <www-style@w3.org>
Message-id: <6DFCA555-6211-46E5-8F6C-E1D3A99E4EED@crissov.de>
Florian Rivoal:
> On Fri, 02 Dec 2011 16:53:29 +0100, Brad Kemper <brad.kemper@gmail.com> wrote:
> 
>> Have you considered some limited form of regex?

Too much, for the first level at least. RX often also assume an outdated symbol / character / byte / grapheme relation. Attribute selectors can be an example how to work around them.

> Well the transform I proposed turned long s into short s, and that doesn't need any boundary check. If you want to do the opposite transform, you would need something like that.

Yes, but I think that is a very valid usecase, although some smart fonts try to do it, too – actually ‘@text-transform’ would hardly be more than a higher-level equivalent to ‘GSUB’ tables.

Several unsorted suggestions:

Some ‘strtr’-like functions I’ve come across accept different length fields like this first draft does, but instead of truncating the longer one, some of them use require the replacement string/array to be shorter (or equal) and use its last character for all remaining ones. That is useful especially for the border case where the replacement contains only one character.

  @text-transform germanize {
    convert: "æ ø œ" to "ä ö";
    convert: "Æ Ø Œ" to "Ä Ö";
  }

I think a ‘language’ (writing system rather) descriptor is necessary and you should be able to extend predefined types:

  @text-transform uppercase {
    convert: "ß" to "ẞ";
    language: de;
  }
  @text-transform uppercase {
    convert: "i ı" to "İ I";
    language: tr, tk /*…*/;
    scope: case;
  }

It may be helpful to be able to replace bases only and keep diacritics intact and also to replace diacritics only and keep base characters the same. I’m not sure about further more or less informative descriptors like ‘scope’ used above, though. 

  @text-transform oldstyle {
    convert: "Th th Dh dh" to "Þ þ Ð ð";
    convert: "Å å Ø ø Æ æ" to "Aa aa Oi oi Ae ae";
    convert: "Ä ä Ö ö Ü ü" to "Ae ae Oe oe Ue ue";/* di̤aeresis to postscript e */
    language: de, dk, no, se, en;
  }
  @text-transform oldstyle {
    convert: "\0308";/* remove di̤aeresis */
    language: dk, no, se;
    scope: diacritics;
  }
  @text-transform oldstyle {
    convert: "Å å Ø ø Æ æ" to "Aa aa Oi oi Ae ae";
    convert: "\0308" to "\0364";/* di̤aeresis to supscript e */
    language: de;
    scope: diacritics;
  }

Since you may want to work across character boundaries, the strings should be whitespace-separated lists:

  @text-transform orthographic-ligatures {
    convert: "ij" to "ĳ"
    language: nl;
  }

  @text-transform titlecase {
    convert: "and or the an a on by in at from to with without within out of off into onto upon" /*…*/
          to "and or the an a on by in at from to with without within out of off into onto upon" /*…*/;
    language: en;
  }

Maybe the ‘titlecase’ example is also a usecase for making ‘to’ and following parts optional:

  @text-transform titlecase {/* exclusions */
    convert: "and or the an a on by in at from to with without within out of off into onto upon" /*…*/;
    language: en;
  }
Received on Saturday, 3 December 2011 11:44:00 UTC