Re: [css3-text] @text-transform from Florian Rivoal on 2011-12-05 (www-style@w3.org from December 2011)

From: Florian Rivoal <florianr@opera.com>
Date: Mon, 05 Dec 2011 12:34:20 +0100
To: www-style@w3.org
Message-ID: <op.v50k3ixr4p7avi@localhost.localdomain>
On Sat, 03 Dec 2011 12:43:25 +0100, Christoph Päper  
<christoph.paeper@crissov.de> wrote:

> Some ‘strtr’-like functions I’ve come across accept different length  
> fields like this first draft does, but instead of truncating the longer  
> one, some of them use require the replacement string/array to be shorter  
> (or equal) and use its last character for all remaining ones. That is  
> useful especially for the border case where the replacement contains  
> only one character.
>
>   @text-transform germanize {
>     convert: "æ ø œ" to "ä ö";
>     convert: "Æ Ø Œ" to "Ä Ö";
>   }

This is interesting indeed. I am a bit worried about non obvious effects  
when
one list is shorter than the other by mistake, but to be fair, my proposal  
has
the same problem. If the 'from' side is shorter, do you then truncate the  
'to'
side to match, or reject the conversion entierly?

> I think a ‘language’ (writing system rather) descriptor is necessary and  
> you should be able to extend predefined types:
>
>   @text-transform uppercase {
>     convert: "ß" to "ẞ";
>     language: de;
>   }
>   @text-transform uppercase {
>     convert: "i ı" to "İ I";
>     language: tr, tk /*…*/;
>     scope: case;
>   }

I don't understand what 'scope: case' is supposed to mean here. As for the  
language,
I think it belongs in the selector rather than here:

@text-transform german-uppercase { convert: "ß" to "ẞ", uppercase; }

h1[lang="de"] { text-transform:german-uppercase; }

You also use the language to implicitly combine multiple @text-transforms.  
I prefer
explicit combination, by referring to previously defined transforms by  
name:

@text-transform foo { convert: "a" to "b"; }
@text-transform bar { convert: foo, "c to "d"; }

First, I find it more readable, but also, it gives the author control over  
the order
in which the various parts are applied.

> Since you may want to work across character boundaries, the strings  
> should be whitespace-separated lists:
>
>   @text-transform orthographic-ligatures {
>     convert: "ij" to "ĳ"
>     language: nl;
>   }

>   @text-transform oldstyle {
>     convert: "Th th Dh dh" to "Þ þ Ð ð";
>   }


There are some advantages to this:
- no need to worry about legacy vs extended grapheme cluster vs
   single unicode codepoint, as the author makes it explicit
- More expressive, as you can to indeed work on multiple characters
   at a time

Yet, I am not sure, as the processing model becomes more complicated
if we don't limit ourselves to 1 character at a time. For example, if
you modify "arrgh" with the following transform, what do you get?

@text-transform dont-do-this { convert: "r rg rr ar gh" to "1 2 3 4 5"; }

We can define our way out of that, but I am not convinced it
is worth it. Of course there are transforms we won't be able to express
without it, but there will be anyway, so we have to draw the line  
somewhere,
and I think restricting ourselves to being character based is a reasonable
place to draw the line.


  - Florian
Received on Monday, 5 December 2011 11:34:49 UTC