RE: [css3-text] text-transform:capitalize (was New WD of CSS Text Level 3

Given Brady's suggestion, I've been thinking about what options we have for the property value.

If we read the current spec[1] carefully, it mentions:

> Although limited, the case mapping process has
> some language dependencies. Some well known
> examples are Turkish and Greek. If the content
> language is known then any such language-specific
> rules must be used.

It reads to me that the French case must be supported. So it's good spec-wise, but the implementations and testing must be very difficult.

I also checked existing products. InDesign supports uppercase and smallcaps, but not titlecap. MS Word does the same.

Given these, I think options we have are:

1. Give up this feature in CSS3 Text. There's no way for all browsers to implement this correctly and in interoperable way for every language and for every case. No major products support this feature. We're trying to do too much.
2. Make this completely up to the UA (similar to current text). No interoperability are guaranteed.
3. Support whatever Unicode defines today, and support interoperability for that level of features.
3.1. and allow UA to implement features like the French case Brady raised. No spec/interoperability for that though.
3.2. and prohibit further extensions for the value. UAs may implement their own non-standard values to support further language-specific conversions.
4. Figure out the correct titlecase logic (if one ever exists) and write it down in this spec.

I think, the option 2 is not a good attitude, and the option 4 is not realistic.

The option 3 seems to be "better than nothing", but it may be too English-centric decision and if many people feels so, we might end up with the option 1. I personally prefer 3 (specifically 3.1.), but I also know, sometimes "not good for everyone" wins over "good for single person but not for others", so I guess the option 1 has reasons to choose.

Opinions?

[1] http://dev.w3.org/csswg/css3-text/#text-transform


Regards,
Koji

-----Original Message-----
From: www-style-request@w3.org [mailto:www-style-request@w3.org] On Behalf Of Koji Ishii
Sent: Sunday, February 20, 2011 9:55 PM
To: Brady Duga
Cc: Xaxio Brandish; John Cowan; W3C style mailing list; 'WWW International' (www-international@w3.org)
Subject: RE: [css3-text] text-transform:capitalize (was New WD of CSS Text Level 3

> Are we suggesting language-specific changes to UAX#29?

No, I'm looking for an appropriate level of features for the CSS text-transform:capitalize property to support, and in that sense, I think UAX #29 is a good candidate to define the level.

I didn't know the French case you raised, thank you for letting us know about it.

I originally thought this is an easy feature, and all major browsers already support, so we just need to write the spec down. The discussion then discovered that none of them are interoperable today, and doing it right for everyone/every case is pretty difficult.

Would you mind if I ask, what level should CSS text-transform:capitalize support?


Regards,
Koji

-----Original Message-----
From: bradyduga@gmail.com [mailto:bradyduga@gmail.com] On Behalf Of Brady Duga
Sent: Sunday, February 20, 2011 8:14 AM
To: Koji Ishii
Cc: Xaxio Brandish; John Cowan; W3C style mailing list; 'WWW International' (www-international@w3.org)
Subject: Re: [css3-text] text-transform:capitalize (was New WD of CSS Text Level 3

Are we suggesting language-specific changes to UAX#29? For instance, the proper French titlecase of "l'histoire de france pour les nuls" is "L'Histoire de France pour les Nuls", not "L'histoire de France pour les Nuls". Ignoring the fact that this would result in more caps then expected (de, pour and les), it seems like there is no way to get both French (l'histoire -> L'Histoire) and English (can't -> Can't) titlecasing without using language-specific word break tables.

--Brady

On Feb 19, 2011, at 2:35 PM, Koji Ishii wrote:


John and Xaxio, thank you a lot for leading this issue to the right direction.

It looks like this is the way to go:
1. Use UAX#29 Word Boundaries[1] to delimit words 2. Take first letter or numeric of words and if it's a letter, use Unicode titlecase

There are two problems with this approach:
1. UAX#29 defines "a.a" as a word and therefore it doesn't solve the "a.m." case Xaxio raised.
2. There are no single browser that use this logic

If we modify UAX#29 to delimit words by "." U+002E and U+FF0E FULLWIDTH FULL STOP, Safari and Chrome seem to be very close. I tested all punctuation listed in UAX#29 and the two are the only exceptions (I haven't  tested if all other punctuation delimit words though.)

So here's the modified proposal:
1. Exclude U+002E and U+FF0E from MidNumLet in UAX#29 Word Boundaries and use it to delimit words.
2. Take first letter-or-numeric of words (skip punctuation) and if it's a letter, use Unicode titlecase.

I don't think we need to worry about "O'Donnell" as it's unlikely that someone writes this as "o'donnell" and apply titlecase to it.

IE9 seems to have "." as one of exceptions to delimit words, and that worries me that there may be counter cases to "a.m."; i.e., cases where "." should not delimit words. Does anyone have any idea?

[1] http://www.unicode.org/reports/tr29/#Word_Boundaries


Regards,
Koji

-----Original Message-----
From: Xaxio Brandish [mailto:xaxiobrandish@gmail.com]
Sent: Sunday, February 20, 2011 6:28 AM
To: John Cowan
Cc: Koji Ishii; W3C style mailing list; 'WWW International' (www-international@w3.org)
Subject: Re: [css3-text] text-transform:capitalize (was New WD of CSS Text Level 3

John,

I was thinking about commenting on this as well, but I hesitated due to the characters in Japanese not being technically "letters".  I'm glad that you said something, because at least I was thinking along the right lines.  I also hesitated because I was wondering if "word" in the description covers only letters and already excludes punctuation.

Perhaps "word" should be defined as "characters excluding punctuation and whitespace".  In Firefox and Chrome tests, numbers directly in before letters keep the letters from receiving capitalization when using this property.

Also, what about names like O'Donnell?  Are these kinds of cases undetectable for the purpose of applying this property...?

--Xaxio
On Sat, Feb 19, 2011 at 1:16 PM, John Cowan <cowan@mercury.ccil.org> wrote:
Koji Ishii scripsit:


Transforms the first character in each word to uppercase; all other characters remain unaffected; i.e., they're not transformed to lowercase, but will appear as written in the document.
It seems to me that it is better to speak of the "first letter with case".
For example, "'tis" (short for "it is") titlecases to "'Tis", not "'tis".
Similarly, the word "!Kung" (the name of a South African people) is correctly so capitalized whether the "!" is the punctuation mark or the identical-looking U+01C3, a caseless letter.  (The Dutch words 't, 's, and 'n never get capitalized, but we can't have everything.)

Furthermore, the Croatian double letters dj, lj, nj, and dz-with-caron must be correctly titlecased to Dj, Lj, Nj, and Dz-with-caron, whether they are represented with one character or two.  Unicode already provides a titlecase mapping that handles these and other two-letter characters.

--
It was impossible to inveigle           John Cowan <cowan@ccil.org> Georg Wilhelm Friedrich Hegel           http://www.ccil.org/~cowan Into offering the slightest apology For his Phenomenology.                      --W. H. Auden, from "People" (1953)

Received on Monday, 21 February 2011 04:38:10 UTC