W3C home > Mailing lists > Public > www-international@w3.org > January to March 2011

Re: [css3-text] text-transform:capitalize (was New WD of CSS Text Level 3

From: Xaxio Brandish <xaxiobrandish@gmail.com>
Date: Sat, 19 Feb 2011 12:56:48 -0800
Message-ID: <AANLkTikrBsk71Uy22ZjMtXcPcABm4LcG7vmTjzEwgc=_@mail.gmail.com>
To: Koji Ishii <kojiishi@gluesoft.co.jp>
Cc: W3C style mailing list <www-style@w3.org>, "'WWW International' (www-international@w3.org)" <www-international@w3.org>
Koji and Thomas,

Thanks for all of your feedback and assistance in helping me understand the
spec better.

In regard to question 2, you wouldn't want "Cat That'S Nearly Home" as a
headline(from "cat that's nearly home"), but you would want "Cat
Successfully Escapes Dog/Wolf" (from "cat successfully escapes dog/wolf").
Both Chrome and Firefox handle the first headline correctly.  In the second
headline, Chrome handles this correctly, but Firefox does not.  Punctuation
seems to be a mixed bag.

"The.the" is one way to look at it, but a more practical example may be
"a.m." and "p.m.".  Looking at the rules of popular style guides for
publication writing, this should NEVER be mixed case
http://www.businesswritingblog.com/business_writing/2009/06/what-is-the-correct-time-am-pm-am-pm-am-pm-.html.
Chrome handles this correctly, but Firefox mixes the case.

Just in case the question is raised, let's cover a "why...".  Why would
somebody use text-transform: capitalize in a publication if they're very
concerned about the case being different between user agents?  The answer is
that they should be able to rely on a standard to be the standard.  With
data sources on the web, you may want a document to look professional
regardless of where the content comes from, and there are cases where you
won't always have editorial control over the source of your content (RSS
feeds, data input errors, etc.)

--Xaxio

On Sat, Feb 19, 2011 at 3:31 AM, Koji Ishii <kojiishi@gluesoft.co.jp> wrote:

> Hi Xaxio,
>
> > Section 3.1 uses the word "titlecase", but according to
> > Wikipedia, this isn't standardized:
> > http://en.wikipedia.org/wiki/Letter_case#Choice_of_case_in_text
>
> > My question in regard to that is: Should this be better defined?
> > I ask because one implementation of title-case may use "The"
> > and another may use "the" (which can PO quite a few authors
> > [both CSS authors and book authors]).  If a CSS author uses
> > title-case and finds that "The" capitalizes differently on
> > different browsers, that could be reason to go back into the
> > source document and manually capitalize the text there.
> > The spread of that kind of frustration could cause this property
> > to be ignored in key places in more widely used publications.
>
> This is a very good feedback, thank you. I thought we had a little more
> verbose description here, but can't find in CVS history, so I must have
> dreamed about it but have never written.
>
> ---
> Transforms the first character in each word to uppercase; all other
> characters remain unaffected; i.e., they're not transformed to lowercase,
> but will appear as written in the document.
>
> The word in this definition is valid only for scripts that use spaces to
> delimit words.
> ---
>
> The problem is in the second paragraph to define what a "word" is in this
> context. It's always ambiguous as we know, but we know the use case in this
> context is for emphasis usually in headings, and we want to keep the rule as
> simple as possible.
>
> Also, all major browsers (IE, Firefox, Safari, Chrome, Opera) have already
> implemented this value, so we want the definition as much compatible as
> possible with existing browsers.
>
> Under these circumstances, I ran some simple tests.
>
> A. Are the 2nd and later characters affected?
> "all lower cases and ALL UPPER CASES"
> transforms to
> "All Lower Cases And ALL UPPER CASES"
> in all 5 browsers, so the 1st paragraph above seems to be good.
>
> B. Should "the" be "The" or "the"? (part of your question)
> All 5 browsers transform "the" to "The". We don't have any dictionary-based
> intelligence here.
>
> C. Punctuation
> "the.the" transforms to:
> C.1: IE, Firefox, Opera: "The.the"
> C.2: Safari, Chrome: "The.The"
>
> D. Mixed Scripts (East Asia)
> "the&#x3042;the" (U+3042 HIRAGANA A[1]) transforms to:
> D.1: Firefox, Opera: "The&#x3042;the"
> D.2: IE, Safari, Chrome: "The&#x3042;The"
>
> E. Mixed Scripts (RTL)
> "the&#x066E;the" (U+066E ARABIC LETTER DOTLESS BEH) transforms to:
> E.1: IE, Firefox, Opera, Safari, Chrome: The&#x066E;the"
>
> F. Mixed Scripts (South Asia)
> "the&#x0E01;the" (U+0E01 THAI CHARACTER KO KAI) transforms to:
> F.1: IE, Firefox, Opera, Safari, Chrome: "The&#x0E01;the"
>
> Now I have following questions to you all:
> 1. Are there any other cases we should consider other than above?
> 2. For C. Punctuation, which is the right behavior? C.2 seems to be right
> given the general definition of "word", but for this property, I guess C.1
> is more safe and C.2 doesn't have good use cases, but I'm not sure.
> 3. For D. Mixed Scripts (East Asia), which is the right behavior? My
> preference given the use case is D.2.
> 4. Are these test cases correct, especially for E and F? I guess E isn't
> real use case because Arabic uses space between English and Arabic as far as
> I understand, but I'm not sure.
> 5. Are behaviors for E and F correct?
>
> [1] http://www.unicode.org/charts/PDF/U3040.pdf
>
> Regards,
> Koji
>
Received on Saturday, 19 February 2011 22:11:06 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Saturday, 19 February 2011 22:12:34 GMT