W3C home > Mailing lists > Public > www-international@w3.org > January to March 2011

Re: [css3-text] text-transform:capitalize

From: Brady Duga <duga@ljug.com>
Date: Mon, 21 Feb 2011 16:47:31 -0800
Cc: Brady Duga <duga@ljug.com>, John Cowan <cowan@mercury.ccil.org>, Koji Ishii <kojiishi@gluesoft.co.jp>, Christoph Päper <christoph.paeper@crissov.de>, W3C style mailing list <www-style@w3.org>, "'WWW International' (www-international@w3.org)" <www-international@w3.org>
Message-Id: <20AACDB5-ED24-4456-AF5A-C09E8A1FAB80@ljug.com>
To: Xaxio Brandish <xaxiobrandish@gmail.com>
I was actually thinking that the term "titlecase" should be removed from the description and we should continue using capitalization. The term "titlecase" is poorly defined for strings, and the thing that is being proposed (setting to title case the first letter of every word) is not what most Latin based languages would consider title case. The current term, "capitalize", seems to more accurately reflect the algorithm. 

--Brady

On Feb 21, 2011, at 1:08 PM, Xaxio Brandish wrote:

> Good afternoon,
> 
> I like the idea of sticking to the Unicode standard.
> 
> Presenting the text that Koji (I hope that's respectful) mentioned,
> 
> Although limited, the case mapping process has some language dependencies. Some well known examples are Turkish and Greek. If the content language is known then any such language-specific rules must be used.
> 
> it seems to me that these language-specific rules would include the a.m. and A.M. examples, as well as the French l'[....] example.
> 
> I may be misreading the draft, but using the Unicode standardization schemes plus this idea means that the issue at hand is already covered in the document, but that it isn't being implemented correctly across all platforms.  If the interest is more in leaning toward conforming toward existing implementations, then perhaps "MUST" can be changed to "SHOULD".  I still vote for the "MUST" for headlining reasons, though.
> 
> Building upon this possible misread of the draft, the question becomes not whether "titlecase" should be redefined, but whether we should list the language-specific rules, which falls under suggestion Koji's suggestion #1 ("we're trying to do too much").
> 
> As such, I agree with Koji's suggestions 3 and 3.1, but have more questions about 3.2 (preventing further extensions for the value).  I'm not sure what an "extension of the value" means in this case (more values than already exist?).
> 
> 1. Would it or would it not make more sense to call this value "titlecase" instead of "capitalize", since "capitalize" (according to Unicode 6 Chapter 5.18 that Koji mentioned) could refer to to either the "titlecase" or "uppercase" of digraph characters?  This way, it keeps more with what is already defined in that specification.
> 
> 2. Since the "language-specific rules must be used" text of the draft doesn't strictly keep with the Unicode standard (UAX #29 calls such rules a "tailoring"), should a value be added that reflects being able to use the Unicode titlecase ("titlecase"), as well as the CSS3 titlecase + language-specific rules ("capitalize")?
> 
> --Xaxio
> 
> 
> On Mon, Feb 21, 2011 at 9:54 AM, John Cowan <cowan@mercury.ccil.org> wrote:
> Koji Ishii scripsit:
> 
> > Even for "." (PERIOD) case Xaxio raised, I'm still skeptical whether
> > CSS should treat it differently from Unicode or not. I understand how
> > "a.m." should be titlecased, but I haven't investigated if there were
> > any counter-cases, nor asked if Unicode guys considered that case
> > or not. Unicode guys must have reasons to make "." as MidNumLet,
> > not MidNum. IE must have reasons to make "." not to break words in
> > titlecasing, and WebKit must have reasons to break. I'm not saying
> > that Xaxio is wrong, but just that we still know little to make the
> > decision to do it differently from what Unicode defines.
> 
> No algorithmic solution can get all the cases correct.  As Don Knuth
> says in the comments to the English-language TeX hyphenation tables,
> if you want bath-ing to hyphenate correctly you will have to live with
> noth-ing (which is tolerable, especially for Americans) and anyth-ing
> (which is not).  He ends by saying "You can't have every-thing."
> 
> The argument for sticking to Unicode rules is that they mostly get
> it right and (importantly) *they already exist*.
> 
> --
> John Cowan                                cowan@ccil.org
> I amar prestar aen, han mathon ne nen,    http://www.ccil.org/~cowan
> han mathon ne chae, a han noston ne 'wilith.  --Galadriel, LOTR:FOTR
> 
> 
Received on Tuesday, 22 February 2011 00:48:10 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 22 February 2011 00:48:14 GMT