- From: Mark Davis ☕ <mark@macchiato.com>
- Date: Mon, 21 Feb 2011 13:41:47 -0800
- To: Xaxio Brandish <xaxiobrandish@gmail.com>
- Cc: John Cowan <cowan@mercury.ccil.org>, Koji Ishii <kojiishi@gluesoft.co.jp>, Christoph Päper <christoph.paeper@crissov.de>, W3C style mailing list <www-style@w3.org>, "'WWW International' (www-international@w3.org)" <www-international@w3.org>
- Message-ID: <AANLkTi=LWT4fP40ZnR-hbBKcWxW9tKrcGNLjMoU2tDsh@mail.gmail.com>
In the Unicode Consortium, language-specific rules such as for titlecasing, fall under the CLDR technical committee <http://cldr.unicode.org/>. There was a ticket filed for adding structure and data some time ago, but it hadn't reached a high enough relative priority for the committee to work on. If the W3C is interested in this, I can bring it up on the committee agenda. Mark On Mon, Feb 21, 2011 at 13:08, Xaxio Brandish <xaxiobrandish@gmail.com>wrote: > Good afternoon, > > I like the idea of sticking to the Unicode standard. > > Presenting the text that Koji (I hope that's respectful) mentioned, > > Although limited, the case mapping process has some language dependencies. >> Some well known examples are Turkish and Greek. If the content language is >> known then any such language-specific rules must be used. >> > > it seems to me that these language-specific rules would include the a.m. > and A.M. examples, as well as the French l'[....] example. > > I may be misreading the draft, but using the Unicode standardization > schemes plus this idea means that the issue at hand is already covered in > the document, but that it isn't being implemented correctly across all > platforms. If the interest is more in leaning toward conforming toward > existing implementations, then perhaps "MUST" can be changed to "SHOULD". I > still vote for the "MUST" for headlining reasons, though. > > Building upon this possible misread of the draft, the question becomes not > whether "titlecase" should be redefined, but whether we should list the > language-specific rules, which falls under suggestion Koji's suggestion #1 > ("we're trying to do too much"). > > As such, I agree with Koji's suggestions 3 and 3.1, but have more questions > about 3.2 (preventing further extensions for the value). I'm not sure what > an "extension of the value" means in this case (more values than already > exist?). > > 1. Would it or would it not make more sense to call this value "titlecase" > instead of "capitalize", since "capitalize" (according to Unicode 6 Chapter > 5.18 that Koji mentioned) could refer to to either the "titlecase" or > "uppercase" of digraph characters? This way, it keeps more with what is > already defined in that specification. > > 2. Since the "language-specific rules must be used" text of the draft > doesn't strictly keep with the Unicode standard (UAX #29 calls such rules a > "tailoring"), should a value be added that reflects being able to use the > Unicode titlecase ("titlecase"), as well as the CSS3 titlecase + > language-specific rules ("capitalize")? > > --Xaxio > > > On Mon, Feb 21, 2011 at 9:54 AM, John Cowan <cowan@mercury.ccil.org>wrote: > >> Koji Ishii scripsit: >> >> > Even for "." (PERIOD) case Xaxio raised, I'm still skeptical whether >> > CSS should treat it differently from Unicode or not. I understand how >> > "a.m." should be titlecased, but I haven't investigated if there were >> > any counter-cases, nor asked if Unicode guys considered that case >> > or not. Unicode guys must have reasons to make "." as MidNumLet, >> > not MidNum. IE must have reasons to make "." not to break words in >> > titlecasing, and WebKit must have reasons to break. I'm not saying >> > that Xaxio is wrong, but just that we still know little to make the >> > decision to do it differently from what Unicode defines. >> >> No algorithmic solution can get all the cases correct. As Don Knuth >> says in the comments to the English-language TeX hyphenation tables, >> if you want bath-ing to hyphenate correctly you will have to live with >> noth-ing (which is tolerable, especially for Americans) and anyth-ing >> (which is not). He ends by saying "You can't have every-thing." >> >> The argument for sticking to Unicode rules is that they mostly get >> it right and (importantly) *they already exist*. >> >> -- >> John Cowan cowan@ccil.org >> I amar prestar aen, han mathon ne nen, http://www.ccil.org/~cowan >> han mathon ne chae, a han noston ne 'wilith. --Galadriel, LOTR:FOTR >> >> >
Received on Monday, 21 February 2011 21:43:42 UTC