Re: [css3-text] @text-transform from Christoph Päper on 2011-12-05 (www-style@w3.org from December 2011)

From: Christoph Päper <christoph.paeper@crissov.de>
Date: Mon, 5 Dec 2011 14:07:36 +0100
To: www-style list <www-style@w3.org>
Message-Id: <4349B5EE-C1A4-4501-B981-C7779E348814@crissov.de>
Florian Rivoal:
> On Sat, 03 Dec 2011 12:43:25 +0100, Christoph Päper 
> 
>> I think a ‘language’ (writing system rather) descriptor is necessary
>> and you should be able to extend predefined types:

> I don't understand what 'scope: case' is supposed to mean here.

It was used in an informative way only. That’s not useful, please ignore.

> As for the language, I think it belongs in the selector rather than here:
> 
> @text-transform german-uppercase { convert: "ß" to "ẞ", uppercase; }
> h1[lang="de"] { text-transform:german-uppercase; }

I disagree. Assume the markup is correctly tagged for language, you then wouldn’t want to use ‘[lang]’ or rather ‘:lang()’ for each and every ruleset whose selector could match instances in more than one language. Instead you’d want to describe your transformation as language-dependent.

  @text-transform german-uppercase {/*…*/, uppercase;}
  @text-transform french-uppercase {/*…*/, uppercase;}
  @text-transform polish-uppercase {/*…*/, uppercase;}
  @text-transform turk-uppercase {/*…*/, uppercase;}
  h1, h4 {text-transform: uppercase;}
  h1:lang(de), h4:lang(de) {text-transform: german-uppercase;}
  h1:lang(fr), h4:lang(fr) {text-transform: french-uppercase;}
  h1:lang(pl), h4:lang(pl) {text-transform: polish-uppercase;}
  h1:lang(tr, tk), h4:lang(tr, tk) {text-transform: turk-uppercase;}

versus

  @text-transform uppercase {language: de; /*…*/}
  @text-transform uppercase {language: fr; /*…*/}
  @text-transform uppercase {language: pl; /*…*/}
  @text-transform uppercase {language: tr, tk; /*…*/}
  h1, h4 {text-transform: uppercase;}

or

  @text-transform uppercase {
    /*…*/ for "de",
    /*…*/ for "fr",
    /*…*/ for "pl",
    /*…*/ for "tr tk";
  }
  h1, h4 {text-transform: uppercase;}

or without overloading: …

  h1, h4 {text-transform: uppercase; text-transform: mycase;}

> You also use the language to implicitly combine multiple @text-transforms.

Yes, like ‘@font-face’ does.

An important question is whether authors should be able to alter or just to extend existing transformations.

One of the major annoyances of ‘text-transform’ in level 2 for many authors is its lack of language-dependence. They’re helped best if they can just extend existing values to the needs of their language(s) and in-house styles.

> I prefer explicit combination, by referring to previously defined transforms by name: (…) First, I find it more readable, but also, it gives the author control over the order in which the various parts are applied.

Recursion should be excluded by design, i.e.

  @… {convert: "A B" to "B A", "AA" to "BB";}
or
  @… {convert: "A B" to "B A"; convert: "AA" to "BB";}
or
  @… foo {convert: "A B" to "B A";}
  @… bar {convert: foo, "AA" to "BB";}

applied to “ABBA” would all yield “BAAB”, and neither “BBBB” nor anyhting else.

>> the strings should be whitespace-separated lists:
> 
> - More expressive, as you can to indeed work on multiple characters at a time

That can, of course, also be a disadavantage.

> if you modify "arrgh" with the following transform, what do you get?
> 
> @text-transform dont-do-this { convert: "r rg rr ar gh" to "1 2 3 4 5"; }

Yeah, that’s like my “to Toronto” example. 

In transliteration (e.g. romanization), greedy tokenization usually works best and for equal-length items you would have to trust the order the author used, i.e. longer search items first and thus “arrgh” → “ar{rg}h” → “{ar}{rg}h” → “42h”. (That reminds me, ‘to’ could be ‘>’, ‘=>’ or ‘->’ instead if that helps tokenizers.) I don’t think it’s trivial (or useful) to require a most-matched algorithm, i.e. “{ar}{r}{gh}” → “415”. 
With source character cycling, we’d get greedy “arrgh” → “{ar}rgh” → “{ar}{r}gh” → “{ar}{rg}h” → “42h” or lazy “arrgh” → “a{r}rgh” → “a{r}{r}gh” → “a{r}{r}{gh}” → “a115”, with replacement advancing we also arrive at “arrgh” → “a{r}{r}gh” → “a{r}{r}{gh}” → “a115”.

> Of course there are transforms we won't be able to express without it,

One thing that should be possible to describe is ‘titlecase’.

> and I think restricting ourselves to being character based is a reasonable
> place to draw the line.

When you do that, do it in a way that can be easily improved on in future levels. Required spaces seem like the better choice in this regard.
Received on Monday, 5 December 2011 13:08:06 UTC