- From: Jonathan Kew <jfkthame@gmail.com>
- Date: Mon, 23 Mar 2015 18:29:39 +0000
- To: www-style@w3.org
On 18/3/15 03:15, fantasai wrote: > On 03/10/2015 11:29 AM, Richard Ishida wrote: >> >> i was wondering about how to treat enclosed alphanumerics when >> text-align is set to capitalize. >> >> See the test results at >> http://www.w3.org/International/tests/repo/results/text-transform >> >> wrt uppercase or lowercase transforms, the spec simply says "Puts all >> letters in lowercase", or vice versa, and that seems to >> me appropriate, for those characters that have Unicode mappings. The >> tests text-transform-upperlower-026.html, >> text-transform-upperlower-027.html indicate that this is what happens >> across all major desktop browsers. >> >> For text-transform: capitalize, however, the spec says "Puts the first >> *typographic letter unit* of each word in titlecase" >> (my emphasis). As you can see in test >> text-transform-capitalize-031.html, it makes sense when punctuation >> and the like >> precede the actual word of the text to look for the first real letter. >> (All browsers pass that test.) >> >> it's not clear to me, however, whether a word that only consists of >> enclosed alphanumerics (which don't fit the definition of >> 'typgraphic letter unit'), or even one that starts with an enclosed >> alphanumeric block character, should be not title cased: >> see the results of text-transform-capitalize-026.html. Firefox >> currently does not. Chrome and Safari, on the other hand do >> titlecase per the Unicode data. IE titlecases everything except the >> first word on the page. >> >> i can't imagine that people will want to do this very often, so this >> seems much like an edge case, but i thought i'd ask the >> question, all the same. >> >> what's the answer? > > I think we should go with whatever the Unicode case mapping files > define, and adjust the CSS spec wording to match. Sorry to keep beating on this issue, but I'm not sure that really answers the question here. This isn't primarily about what's in the case mapping files -- which deal only with individual Unicode characters -- but about identifying the "typographic letter units" to which the case mapping should be applied. The current CSS 'capitalize' transform is quite different in that regard from the toTitlecase(s) function[1] defined by Unicode, which as far as I can recall is the nearest parallel: # R3 toTitlecase(X): Find the word boundaries in X according to # Unicode Standard Annex #29, “Unicode Text Segmentation.” For each # word boundary, find the first cased character F following the word # boundary. If F exists, map F to Titlecase_Mapping(F); then map all # characters C between F and the following word boundary to # Lowercase_Mapping(C). In general terms, the key issue here is that toTitlecase applies case mappings to all the letters of a word (Titlecase to the first, and Lowercase to the rest), whereas the CSS property applies Titlecase to the first letter and leaves the rest unchanged. Therefore, given content such as Ramsay MacDonald visits the USA text-transform:capitalize will result in Ramsay MacDonald Visits The USA whereas Unicode's toTitlecase() would give Ramsay Macdonald Visits The Usa which I don't think is desirable. Given this difference in approach, I think we should continue to let CSS Text define exactly what text-transform:capitalize does -- in particular, which characters it affects -- rather than delegating this to Unicode. As Richard points out, the current draft of CSS Text excludes the enclosed alphanumerics ⓐⓑⓒ etc. from its definition of "typographic letter units", and therefore they should also be excluded from the "words" that 'capitalize' affects. IMO, that's the most reasonable option: these characters are more symbol- or dingbat-like than letter-like, as reflected in their Unicode General Category of "So". So I'd like the WG to confirm that this is the correct interpretation of the spec. A further issue that I don't think has been mentioned here relates to the 'uppercase' and 'lowercase' transforms. ISTM that these transforms, too, should only affect "letters" (or "typographic letter units", as CSS Text likes to call them) and should leave Symbol characters untouched, even though some Symbol characters -- by no means all the "enclosed letter-based" ones -- do have case mappings. The CSS Text draft is less clear about this, inasmuch as it fails to link the term "letters" in 'uppercase' and 'lowercase' to a definition in the Terminology section (as earlier drafts did), but the only plausible interpretation I can see is that "letter" here is shorthand for "typographic letter unit", and so once again the Symbol characters are excluded. AFAIK, all engines -- including Gecko, which gets 'capitalize' right by this interpretation -- currently mishandle this, and apply case mappings to Symbol characters. However, I doubt that changing our behavior to match the spec here is likely to "break the Web" in any substantial way, and it would put us in a more consistent and predictable state. (It would seem odd that 'text-transform:uppercase' affects ⓐⓑⓒ if 'text-transform:capitalize' does not; or that 'text-transform:lowercase' affects ⒶⒷⒸ but not 🅐🅑🅒.) In summary, I think the CSS Text spec should maintain its definition of these transforms as applying only to letters, and should reinstate its link to the definition of "[typographic] letter [unit]" for 'uppercase' and 'lowercase' to reinforce this. An informative note could be added alerting implementers to the fact that some non-Letter characters have case mappings defined in Unicode, but should *not* be affected by these text-transform values. JK [1] http://www.unicode.org/versions/Unicode7.0.0/ch03.pdf, page 154.
Received on Monday, 23 March 2015 18:30:08 UTC