Re: [css3-text] text-transform:capitalize from fantasai on 2011-02-23 (www-international@w3.org from January to March 2011)

From: fantasai <fantasai.lists@inkedblade.net>
Date: Tue, 22 Feb 2011 17:40:29 -0800
To: Koji Ishii <kojiishi@gluesoft.co.jp>
CC: Brady Duga <duga@ljug.com>, Asmus Freytag <asmusf@ix.netcom.com>, Mark Davis ☕ <mark@macchiato.com>, Xaxio Brandish <xaxiobrandish@gmail.com>, John Cowan <cowan@mercury.ccil.org>, Christoph Päper <christoph.paeper@crissov.de>, W3C style mailing list <www-style@w3.org>, "www-international@w3.org" <www-international@w3.org>, Bert Bos <bert@w3.org>, Håkon Wium Lie <howcome@opera.com>
Message-ID: <4D64658D.6080807@inkedblade.net>

On 02/21/2011 10:49 PM, Koji Ishii wrote:
> Thank you all for the great contributions, it looks like we're in the consensus for the following points:
>
> 1. The feature should rely on Unicode to define its scope
> 2. The name of the value should stay unchanged
> 3. The wording "language-specific rules *must* be used"[1] should be weakened
>    at least for this value as language-specific rules for this value is more
>    complicated than upper/lower. We'd like to allow UAs to implement
>    language-specific rules, but we might not be able to test and make them
>    interoperable.
> 4. Use UAX#29 for word break
> 5. Apply Titlecase_Mapping defined in Unicode[2] to the first letter of every word

I'd like to add a couple clarifications to what we're discussing here.

1. The 'capitalize' value exists since CSS1. [1] It's value name absolutely
    cannot be changed, neither can it be dropped from the spec.

2. The spec uses the term titlecase because it intends to use the titlecase
    character mappings from Unicode. As several people have pointed out,
    using the uppercase tables would not give correct results for two-letter
    characters like dz.

3. Similarly, the requirement for language-specific rules is for the case-mapping
    rules, not for the word-boundary rules.

The issue here is the word-boundary and word-selection rules, which vary by
language and indeed vary within language (sometimes depending on usage of the
phrase, not even on the orthographic variant). Bert and Håkon may correct me
if I'm wrong, but I believe the original intention was for the 'capitalize'
value to be very simple and not depend on context or language. The original
spec text, "first character of each word" is not correct even for English, and
I am sure they were aware of this when they wrote it. Punctuation complicates
the situation: one cannot rely only on space-detection to capitalize words.
For that reason we may want to rely on UAX#29 to achieve interoperability.
But I think to expect UAs to do anything more intelligent than that is asking
too much.

[1] http://www.w3.org/TR/CSS1/#text-transform

~fantasai

Received on Wednesday, 23 February 2011 01:41:14 UTC