- From: Florian Rivoal <florianr@opera.com>
- Date: Mon, 05 Dec 2011 12:34:20 +0100
- To: www-style@w3.org
On Sat, 03 Dec 2011 12:43:25 +0100, Christoph Päper <christoph.paeper@crissov.de> wrote: > Some ‘strtr’-like functions I’ve come across accept different length > fields like this first draft does, but instead of truncating the longer > one, some of them use require the replacement string/array to be shorter > (or equal) and use its last character for all remaining ones. That is > useful especially for the border case where the replacement contains > only one character. > > @text-transform germanize { > convert: "æ ø œ" to "ä ö"; > convert: "Æ Ø Œ" to "Ä Ö"; > } This is interesting indeed. I am a bit worried about non obvious effects when one list is shorter than the other by mistake, but to be fair, my proposal has the same problem. If the 'from' side is shorter, do you then truncate the 'to' side to match, or reject the conversion entierly? > I think a ‘language’ (writing system rather) descriptor is necessary and > you should be able to extend predefined types: > > @text-transform uppercase { > convert: "ß" to "ẞ"; > language: de; > } > @text-transform uppercase { > convert: "i ı" to "İ I"; > language: tr, tk /*…*/; > scope: case; > } I don't understand what 'scope: case' is supposed to mean here. As for the language, I think it belongs in the selector rather than here: @text-transform german-uppercase { convert: "ß" to "ẞ", uppercase; } h1[lang="de"] { text-transform:german-uppercase; } You also use the language to implicitly combine multiple @text-transforms. I prefer explicit combination, by referring to previously defined transforms by name: @text-transform foo { convert: "a" to "b"; } @text-transform bar { convert: foo, "c to "d"; } First, I find it more readable, but also, it gives the author control over the order in which the various parts are applied. > Since you may want to work across character boundaries, the strings > should be whitespace-separated lists: > > @text-transform orthographic-ligatures { > convert: "ij" to "ij" > language: nl; > } > @text-transform oldstyle { > convert: "Th th Dh dh" to "Þ þ Ð ð"; > } There are some advantages to this: - no need to worry about legacy vs extended grapheme cluster vs single unicode codepoint, as the author makes it explicit - More expressive, as you can to indeed work on multiple characters at a time Yet, I am not sure, as the processing model becomes more complicated if we don't limit ourselves to 1 character at a time. For example, if you modify "arrgh" with the following transform, what do you get? @text-transform dont-do-this { convert: "r rg rr ar gh" to "1 2 3 4 5"; } We can define our way out of that, but I am not convinced it is worth it. Of course there are transforms we won't be able to express without it, but there will be anyway, so we have to draw the line somewhere, and I think restricting ourselves to being character based is a reasonable place to draw the line. - Florian
Received on Monday, 5 December 2011 11:34:49 UTC