RE: [css3-text] text-transform:capitalize from Koji Ishii on 2011-02-22 (www-style@w3.org from February 2011)

From: Koji Ishii <kojiishi@gluesoft.co.jp>
Date: Mon, 21 Feb 2011 22:37:01 -0500
To: Xaxio Brandish <xaxiobrandish@gmail.com>, John Cowan <cowan@mercury.ccil.org>
CC: Christoph Päper <christoph.paeper@crissov.de>, W3C style mailing list <www-style@w3.org>, "'WWW International' (www-international@w3.org)" <www-international@w3.org>
Message-ID: <A592E245B36A8949BDB0A302B375FB4E0AB201D368@MAILR001.mail.lan>

Thank you Xaxio for your continued feedback. Yes, Koji is fine with me.

> have more questions about 3.2 (preventing further
> extensions for the value).  I'm not sure what an
> "extension of the value" means in this case (more
> values than already exist?).

What I meant in 3.2 was that, if people strongly prefer to prohibit language-dependent behavior for the "capitalize" value, we could allow browsers to add, say, "-webkit-capitalize-french" or any other non-standard values.

I agree that "must" should be weakened if we take 3.1, which it looks like we're moving towards to a consensus here.

Regards,
Koji

-----Original Message-----
From: Xaxio Brandish [mailto:xaxiobrandish@gmail.com] 
Sent: Tuesday, February 22, 2011 6:09 AM
To: John Cowan
Cc: Koji Ishii; Christoph Päper; W3C style mailing list; 'WWW International' (www-international@w3.org)
Subject: Re: [css3-text] text-transform:capitalize

Good afternoon,

I like the idea of sticking to the Unicode standard.

Presenting the text that Koji (I hope that's respectful) mentioned,
Although limited, the case mapping process has some language dependencies. Some well known examples are Turkish and Greek. If the content language is known then any such language-specific rules must be used.

it seems to me that these language-specific rules would include the a.m. and A.M. examples, as well as the French l'[....] example.

I may be misreading the draft, but using the Unicode standardization schemes plus this idea means that the issue at hand is already covered in the document, but that it isn't being implemented correctly across all platforms.  If the interest is more in leaning toward conforming toward existing implementations, then perhaps "MUST" can be changed to "SHOULD".  I still vote for the "MUST" for headlining reasons, though.

Building upon this possible misread of the draft, the question becomes not whether "titlecase" should be redefined, but whether we should list the language-specific rules, which falls under suggestion Koji's suggestion #1 ("we're trying to do too much").

As such, I agree with Koji's suggestions 3 and 3.1, but have more questions about 3.2 (preventing further extensions for the value).  I'm not sure what an "extension of the value" means in this case (more values than already exist?).

1. Would it or would it not make more sense to call this value "titlecase" instead of "capitalize", since "capitalize" (according to Unicode 6 Chapter 5.18 that Koji mentioned) could refer to to either the "titlecase" or "uppercase" of digraph characters?  This way, it keeps more with what is already defined in that specification.

2. Since the "language-specific rules must be used" text of the draft doesn't strictly keep with the Unicode standard (UAX #29 calls such rules a "tailoring"), should a value be added that reflects being able to use the Unicode titlecase ("titlecase"), as well as the CSS3 titlecase + language-specific rules ("capitalize")?

--Xaxio

On Mon, Feb 21, 2011 at 9:54 AM, John Cowan <cowan@mercury.ccil.org> wrote:
Koji Ishii scripsit:

> Even for "." (PERIOD) case Xaxio raised, I'm still skeptical whether
> CSS should treat it differently from Unicode or not. I understand how
> "a.m." should be titlecased, but I haven't investigated if there were
> any counter-cases, nor asked if Unicode guys considered that case
> or not. Unicode guys must have reasons to make "." as MidNumLet,
> not MidNum. IE must have reasons to make "." not to break words in
> titlecasing, and WebKit must have reasons to break. I'm not saying
> that Xaxio is wrong, but just that we still know little to make the
> decision to do it differently from what Unicode defines.
No algorithmic solution can get all the cases correct.  As Don Knuth
says in the comments to the English-language TeX hyphenation tables,
if you want bath-ing to hyphenate correctly you will have to live with
noth-ing (which is tolerable, especially for Americans) and anyth-ing
(which is not).  He ends by saying "You can't have every-thing."

The argument for sticking to Unicode rules is that they mostly get
it right and (importantly) *they already exist*.

--
John Cowan                                cowan@ccil.org
I amar prestar aen, han mathon ne nen,    http://www.ccil.org/~cowan

han mathon ne chae, a han noston ne 'wilith.  --Galadriel, LOTR:FOTR

Received on Tuesday, 22 February 2011 03:37:00 UTC