[css3-text] clarifying text-transform:capitalize

I realize this has come up before (a thread in Feb 2011, for example), but I'd like to see if we can get a little more clarity - and perhaps eventually interoperability - regarding the ‘capitalize’ transform. (Yes, I know it'll never be "perfect"!)

The current text at http://www.w3.org/TR/css3-text/#text-transform says

  ‘capitalize’
    Puts the first _letter_ of each word in titlecase; other characters are unaffected.

where _letter_ is defined as

  A letter for the purpose of this specification is a character belonging to one of the Letter or Number general categories in Unicode. [UAX44]

and we're warned that: "The definition of "word" used for ‘capitalize’ is UA-dependent; [UAX29] is suggested (but not required) for determining such word boundaries."

I put a small test-case at http://people.mozilla.org/~jkew/capitalize.html, which applies the ‘capitalize’ transform to two lines of text:
A.   (this) “is” [a] –short– -test- «for» *the* _css_ ¿capitalize? ¡transform!
     ⓐⓑⓒ (ⓓⓔⓕ) —ⓖⓗⓘ— ⓙkl

While this is obviously an unnaturally-cluttered example, my expectation as an author of what ‘capitalize’ should do for the first line is pretty clear:
B1.  (This) “Is” [A] –Short– -Test- «For» *The* _Css_ ¿Capitalize? ¡Transform!

The second line is more questionable, depending whether the circled letters are considered as making up "words" or not:
B2a. Ⓐⓑⓒ (Ⓓⓔⓕ) —Ⓖⓗⓘ— Ⓙkl
B2b. ⓐⓑⓒ (ⓓⓔⓕ) —ⓖⓗⓘ— ⓙKl

However, none of the browsers I tried entirely matches this:

Firefox:
C.   (This) “Is” [A] –short– -test- «For» *The* _css_ ¿Capitalize? ¡Transform!
     Ⓐⓑⓒ (Ⓓⓔⓕ) —ⓖⓗⓘ— Ⓙkl

Safari 5, Chrome:
D.   (This) “Is” [A] –Short– -Test- «For» *The* _css_ ¿Capitalize? ¡Transform!
     Ⓐⓑⓒ (Ⓓⓔⓕ) —Ⓖⓗⓘ— Ⓙkl

IE9:
E.   (This) “Is” [A] –Short– -Test- «For» *The* _Css_ ¿Capitalize? ¡Transform!
     ⓐⓑⓒ (ⓓⓔⓕ) —ⓖⓗⓘ— ⒿKl

Firefox (C) uses the same punctuation test as for first-letter (see http://www.w3.org/TR/selectors/#first-letter) when deciding to skip over word-initial non-letters; this does not include the "connector" and "dash" punctuation categories (GC=Pc,Pd), and hence "short", "test" and "css" are not capitalized.

The Webkit browsers (D) appear to capitalize after "dash" punctuation, but not after "connector".

IE9 (E) successfully capitalizes all the words in my first line in the way I'd probably expect as an author.

The second line of the test uses Unicode circled letters, which are not actually in the "Letter or Number general categories", and so ought to be untouched by ‘capitalize’ according to the current CSS3 Text definition. However, both Gecko and Webkit _do_ capitalize them just like normal letters - including the Gecko failure when preceded by a dash. So they're aiming for something like B2a, modulo the varying punctuation treatment.

IE9, on the other hand, does something more surprising (IMO) - it doesn't capitalize the circled letters (i.e., it seems to be aiming for B2b), _except_ in the case where the initial circled letter is followed by a normal (ASCII) letter. In that case, it capitalizes both the circled initial letter _and_ the following normal one.

What I'd like to confirm - hence this message - is:

(a) that we're happy with the CSS3 Text definition here, as far as it goes, in particular recognizing that it means certain word-initial punctuation characters will be treated differently by ‘first-letter’ and ‘capitalize’; and

(b) that the circled letters, being categorized as Symbols rather than Letters, should _not_ be affected by ‘capitalize’ (i.e. the expected result is B2b).

If so, it looks like we all have some bugs to fix, though to varying degrees...

JK

Received on Thursday, 1 March 2012 14:15:35 UTC