[css3-text] text-transform:capitalize (was New WD of CSS Text Level 3

Hi Xaxio,

> Section 3.1 uses the word "titlecase", but according to
> Wikipedia, this isn't standardized:
> http://en.wikipedia.org/wiki/Letter_case#Choice_of_case_in_text


> My question in regard to that is: Should this be better defined?
> I ask because one implementation of title-case may use "The"
> and another may use "the" (which can PO quite a few authors
> [both CSS authors and book authors]).  If a CSS author uses
> title-case and finds that "The" capitalizes differently on
> different browsers, that could be reason to go back into the
> source document and manually capitalize the text there.
> The spread of that kind of frustration could cause this property
> to be ignored in key places in more widely used publications.

This is a very good feedback, thank you. I thought we had a little more verbose description here, but can't find in CVS history, so I must have dreamed about it but have never written.

---
Transforms the first character in each word to uppercase; all other characters remain unaffected; i.e., they're not transformed to lowercase, but will appear as written in the document.

The word in this definition is valid only for scripts that use spaces to delimit words.
---

The problem is in the second paragraph to define what a "word" is in this context. It's always ambiguous as we know, but we know the use case in this context is for emphasis usually in headings, and we want to keep the rule as simple as possible.

Also, all major browsers (IE, Firefox, Safari, Chrome, Opera) have already implemented this value, so we want the definition as much compatible as possible with existing browsers.

Under these circumstances, I ran some simple tests.

A. Are the 2nd and later characters affected?
"all lower cases and ALL UPPER CASES"
transforms to
"All Lower Cases And ALL UPPER CASES"
in all 5 browsers, so the 1st paragraph above seems to be good.

B. Should "the" be "The" or "the"? (part of your question)
All 5 browsers transform "the" to "The". We don't have any dictionary-based intelligence here.

C. Punctuation
"the.the" transforms to:
C.1: IE, Firefox, Opera: "The.the"
C.2: Safari, Chrome: "The.The"

D. Mixed Scripts (East Asia)
"theあthe" (U+3042 HIRAGANA A[1]) transforms to:
D.1: Firefox, Opera: "Theあthe"
D.2: IE, Safari, Chrome: "TheあThe"

E. Mixed Scripts (RTL)
"theٮthe" (U+066E ARABIC LETTER DOTLESS BEH) transforms to:
E.1: IE, Firefox, Opera, Safari, Chrome: Theٮthe"

F. Mixed Scripts (South Asia)
"theกthe" (U+0E01 THAI CHARACTER KO KAI) transforms to:
F.1: IE, Firefox, Opera, Safari, Chrome: "Theกthe"

Now I have following questions to you all:
1. Are there any other cases we should consider other than above?
2. For C. Punctuation, which is the right behavior? C.2 seems to be right given the general definition of "word", but for this property, I guess C.1 is more safe and C.2 doesn't have good use cases, but I'm not sure.
3. For D. Mixed Scripts (East Asia), which is the right behavior? My preference given the use case is D.2.
4. Are these test cases correct, especially for E and F? I guess E isn't real use case because Arabic uses space between English and Arabic as far as I understand, but I'm not sure.
5. Are behaviors for E and F correct?
 
[1] http://www.unicode.org/charts/PDF/U3040.pdf


Regards,
Koji

Received on Saturday, 19 February 2011 11:32:38 UTC