W3C home > Mailing lists > Public > www-style@w3.org > February 2011

Re: [css3-text] text-transform:capitalize

From: Mark Davis ☕ <mark@macchiato.com>
Date: Mon, 21 Feb 2011 13:41:47 -0800
Message-ID: <AANLkTi=LWT4fP40ZnR-hbBKcWxW9tKrcGNLjMoU2tDsh@mail.gmail.com>
To: Xaxio Brandish <xaxiobrandish@gmail.com>
Cc: John Cowan <cowan@mercury.ccil.org>, Koji Ishii <kojiishi@gluesoft.co.jp>, Christoph Päper <christoph.paeper@crissov.de>, W3C style mailing list <www-style@w3.org>, "'WWW International' (www-international@w3.org)" <www-international@w3.org>
In the Unicode Consortium, language-specific rules such as for titlecasing,
fall under the CLDR technical committee <http://cldr.unicode.org/>. There
was a ticket filed for adding structure and data some time ago, but it
hadn't reached a high enough relative priority for the committee to work on.
If the W3C is interested in this, I can bring it up on the committee agenda.


On Mon, Feb 21, 2011 at 13:08, Xaxio Brandish <xaxiobrandish@gmail.com>wrote:

> Good afternoon,
> I like the idea of sticking to the Unicode standard.
> Presenting the text that Koji (I hope that's respectful) mentioned,
> Although limited, the case mapping process has some language dependencies.
>> Some well known examples are Turkish and Greek. If the content language is
>> known then any such language-specific rules must be used.
> it seems to me that these language-specific rules would include the a.m.
> and A.M. examples, as well as the French l'[....] example.
> I may be misreading the draft, but using the Unicode standardization
> schemes plus this idea means that the issue at hand is already covered in
> the document, but that it isn't being implemented correctly across all
> platforms.  If the interest is more in leaning toward conforming toward
> existing implementations, then perhaps "MUST" can be changed to "SHOULD".  I
> still vote for the "MUST" for headlining reasons, though.
> Building upon this possible misread of the draft, the question becomes not
> whether "titlecase" should be redefined, but whether we should list the
> language-specific rules, which falls under suggestion Koji's suggestion #1
> ("we're trying to do too much").
> As such, I agree with Koji's suggestions 3 and 3.1, but have more questions
> about 3.2 (preventing further extensions for the value).  I'm not sure what
> an "extension of the value" means in this case (more values than already
> exist?).
> 1. Would it or would it not make more sense to call this value "titlecase"
> instead of "capitalize", since "capitalize" (according to Unicode 6 Chapter
> 5.18 that Koji mentioned) could refer to to either the "titlecase" or
> "uppercase" of digraph characters?  This way, it keeps more with what is
> already defined in that specification.
> 2. Since the "language-specific rules must be used" text of the draft
> doesn't strictly keep with the Unicode standard (UAX #29 calls such rules a
> "tailoring"), should a value be added that reflects being able to use the
> Unicode titlecase ("titlecase"), as well as the CSS3 titlecase +
> language-specific rules ("capitalize")?
> --Xaxio
> On Mon, Feb 21, 2011 at 9:54 AM, John Cowan <cowan@mercury.ccil.org>wrote:
>> Koji Ishii scripsit:
>> > Even for "." (PERIOD) case Xaxio raised, I'm still skeptical whether
>> > CSS should treat it differently from Unicode or not. I understand how
>> > "a.m." should be titlecased, but I haven't investigated if there were
>> > any counter-cases, nor asked if Unicode guys considered that case
>> > or not. Unicode guys must have reasons to make "." as MidNumLet,
>> > not MidNum. IE must have reasons to make "." not to break words in
>> > titlecasing, and WebKit must have reasons to break. I'm not saying
>> > that Xaxio is wrong, but just that we still know little to make the
>> > decision to do it differently from what Unicode defines.
>> No algorithmic solution can get all the cases correct.  As Don Knuth
>> says in the comments to the English-language TeX hyphenation tables,
>> if you want bath-ing to hyphenate correctly you will have to live with
>> noth-ing (which is tolerable, especially for Americans) and anyth-ing
>> (which is not).  He ends by saying "You can't have every-thing."
>> The argument for sticking to Unicode rules is that they mostly get
>> it right and (importantly) *they already exist*.
>> --
>> John Cowan                                cowan@ccil.org
>> I amar prestar aen, han mathon ne nen,    http://www.ccil.org/~cowan
>> han mathon ne chae, a han noston ne 'wilith.  --Galadriel, LOTR:FOTR
Received on Monday, 21 February 2011 21:42:40 UTC

This archive was generated by hypermail 2.4.0 : Friday, 25 March 2022 10:07:56 UTC