W3C home > Mailing lists > Public > www-style@w3.org > March 2015

[css-text] text-transform:capitalize and Unicode digraphs

From: Jonathan Kew <jfkthame@gmail.com>
Date: Sun, 15 Mar 2015 18:50:56 +0000
Message-ID: <5505D490.4070507@gmail.com>
To: www-style list <www-style@w3.org>
Unicode includes a few digraph characters such as "dz" and "lj" that have 
uppercase (DZ, LJ) and titlecase (Dz, Lj) equivalents. How should these be 
handled by text-transform:capitalize when they occur in word-initial 
position?

It's clear that the lowercase digraphs (dz) will be transformed according 
to their titlecase mapping (Dz), and that titlecase digraphs will be 
unchanged. But what should be done when the text contains an uppercase 
digraph such as DZ?

By a strict reading of the current CSS Text draft[1]:

# 'capitalize'
#     Puts the first typographic letter unit of each word in titlecase; 
other characters are unaffected.

together with the Unicode standard, which gives Dz as the titlecase 
mapping for DZ, it appears that a word-initial uppercase digraph should 
be converted to its titlecase (mixed) form. This is the behavior I see 
in WebKit and Blink with an example like:

   data:text/html;charset=utf-8,<div 
style="text-transform:capitalize">DZa Dza dza

which renders all three "words" identically: "Dza Dza Dza". Gecko, in 
contrast, does NOT apply the titlecase mapping if the first letter is 
already uppercase, and so the example renders as "DZa Dza Dza".

Although the spec/WebKit/Blink behavior looks "better" for this 
(artificial) example, I would argue that Gecko's behavior is preferable. 
While the "DZa" result here does look poor, it makes little sense for an 
author to enter text in this form in the first place. In contrast, 
consider what happens if text that is originally entered as 
all-uppercase is subject to text-transform:capitalize:

   data:text/html;charset=utf-8,<div 
style="text-transform:capitalize">LJUBLJANA

Here, WebKit and Blink will render the word as "LjUBLJANA", while Gecko 
gives the (better) result "LJUBLJANA".

IMO, this example -- where the entire word is uppercase -- seems more 
important than the case where an uppercase digraph has been used to 
begin an otherwise-lowercase word.

So I'd like to propose a minor change to the definition, something like:

# 'capitalize'
#     Puts the first typographic letter unit of each word in titlecase, 
unless it is already uppercase, in which case it is unchanged. Other 
characters are unaffected.

An alternative, perhaps even better, would be to make it contextual:

#     Puts the first typographic letter unit of each word in titlecase, 
unless it is already uppercase and is followed by another uppercase 
letter, in which case it is unchanged. Other characters are unaffected.

However, given that text-transform:capitalize is likely to remain a 
rather crude instrument -- it doesn't "know" about language-specific 
stop lists of small words that should not be capitalized, for example -- 
I don't think the additional implementation cost of making it 
context-dependent is worthwhile.

Feedback/comments welcomed....

JK


[1] http://dev.w3.org/csswg/css-text-3/#propdef-text-transform
Received on Sunday, 15 March 2015 18:51:25 UTC

This archive was generated by hypermail 2.3.1 : Monday, 2 May 2016 14:39:30 UTC