agenda+ RE: [css3-text] text-transform:capitalize from Phillips, Addison on 2011-02-22 (public-i18n-core@w3.org from January to March 2011)

From: Phillips, Addison <addison@lab126.com>
Date: Mon, 21 Feb 2011 16:05:31 -0800
To: "public-i18n-core@w3.org" <public-i18n-core@w3.org>
Message-ID: <C7A5719F1E562149BA9171F58BEE2CA412CBA78887@EX-IAD6-B.ant.amazon.com>
[+public-i18n-core]

Several W3C working groups, notably the folks in Style (not *just* CSS, but also XSL-FO), would be interested in such data: text transforms of this sort are often desirable. I’d like to add this to our agenda for this week.

Addison Phillips
Globalization Architect (Lab126)
Chair (W3C I18N, IETF IRI WGs)

Internationalization is not a feature.
It is an architecture.



From: www-international-request@w3.org [mailto:www-international-request@w3.org] On Behalf Of Mark Davis ?
Sent: Monday, February 21, 2011 1:42 PM
To: Xaxio Brandish
Cc: John Cowan; Koji Ishii; Christoph Päper; W3C style mailing list; 'WWW International' (www-international@w3.org)
Subject: Re: [css3-text] text-transform:capitalize

In the Unicode Consortium, language-specific rules such as for titlecasing, fall under the CLDR technical committee<http://cldr.unicode.org/>. There was a ticket filed for adding structure and data some time ago, but it hadn't reached a high enough relative priority for the committee to work on. If the W3C is interested in this, I can bring it up on the committee agenda.

Mark

On Mon, Feb 21, 2011 at 13:08, Xaxio Brandish <xaxiobrandish@gmail.com<mailto:xaxiobrandish@gmail.com>> wrote:
Good afternoon,

I like the idea of sticking to the Unicode standard.

Presenting the text that Koji (I hope that's respectful) mentioned,
Although limited, the case mapping process has some language dependencies. Some well known examples are Turkish and Greek. If the content language is known then any such language-specific rules must be used.

it seems to me that these language-specific rules would include the a.m. and A.M. examples, as well as the French l'[....] example.

I may be misreading the draft, but using the Unicode standardization schemes plus this idea means that the issue at hand is already covered in the document, but that it isn't being implemented correctly across all platforms.  If the interest is more in leaning toward conforming toward existing implementations, then perhaps "MUST" can be changed to "SHOULD".  I still vote for the "MUST" for headlining reasons, though.

Building upon this possible misread of the draft, the question becomes not whether "titlecase" should be redefined, but whether we should list the language-specific rules, which falls under suggestion Koji's suggestion #1 ("we're trying to do too much").

As such, I agree with Koji's suggestions 3 and 3.1, but have more questions about 3.2 (preventing further extensions for the value).  I'm not sure what an "extension of the value" means in this case (more values than already exist?).

1. Would it or would it not make more sense to call this value "titlecase" instead of "capitalize", since "capitalize" (according to Unicode 6 Chapter 5.18 that Koji mentioned) could refer to to either the "titlecase" or "uppercase" of digraph characters?  This way, it keeps more with what is already defined in that specification.

2. Since the "language-specific rules must be used" text of the draft doesn't strictly keep with the Unicode standard (UAX #29 calls such rules a "tailoring"), should a value be added that reflects being able to use the Unicode titlecase ("titlecase"), as well as the CSS3 titlecase + language-specific rules ("capitalize")?

--Xaxio

On Mon, Feb 21, 2011 at 9:54 AM, John Cowan <cowan@mercury.ccil.org<mailto:cowan@mercury.ccil.org>> wrote:
Koji Ishii scripsit:

> Even for "." (PERIOD) case Xaxio raised, I'm still skeptical whether
> CSS should treat it differently from Unicode or not. I understand how
> "a.m." should be titlecased, but I haven't investigated if there were
> any counter-cases, nor asked if Unicode guys considered that case
> or not. Unicode guys must have reasons to make "." as MidNumLet,
> not MidNum. IE must have reasons to make "." not to break words in
> titlecasing, and WebKit must have reasons to break. I'm not saying
> that Xaxio is wrong, but just that we still know little to make the
> decision to do it differently from what Unicode defines.
No algorithmic solution can get all the cases correct.  As Don Knuth
says in the comments to the English-language TeX hyphenation tables,
if you want bath-ing to hyphenate correctly you will have to live with
noth-ing (which is tolerable, especially for Americans) and anyth-ing
(which is not).  He ends by saying "You can't have every-thing."

The argument for sticking to Unicode rules is that they mostly get
it right and (importantly) *they already exist*.

--
John Cowan                                cowan@ccil.org<mailto:cowan@ccil.org>
I amar prestar aen, han mathon ne nen,    http://www.ccil.org/~cowan<http://www.ccil.org/%7Ecowan>
han mathon ne chae, a han noston ne 'wilith.  --Galadriel, LOTR:FOTR
Received on Tuesday, 22 February 2011 00:06:06 UTC