[css3-text] @text-transform and clusters (was [css3-text] Splitting CSS Text into Level 3 and Level 4) from John Daggett on 2011-12-15 (www-style@w3.org from December 2011)

From: John Daggett <jdaggett@mozilla.com>
Date: Wed, 14 Dec 2011 20:57:08 -0800 (PST)
To: MURATA Makoto <eb2m-mrt@asahi-net.or.jp>
Cc: www-style@w3.org
Message-ID: <1217128372.47498.1323925028847.JavaMail.root@zimbra1.shared.sjc1.mozilla.com>

Makoto Murata wrote:

> If grapheme clusters, word boundaries, and Unicode normalizations are
> incorporated, the result will be very complicated.  Note that Unicode
> regular expressions Level 1 (Unicode Technical Standard #18)
> significantly simplifies grapheme clusters and word boundaries.
> 
> The smallest generic solution is one-to-one mapping of UCS code
> values. I would be a small subset of your "convert".  I think that it
> would be very appropriate as Level 1 of text transformation.

I think this whole issue is a bit of a red herring.  Yes, it would be
better if the wording explicitly states what to do in the presence of
combining characters.  But that's true in other cases such as selectors
too, there's no description of how identifiers that use combining
characters are matched.  We also had a nice long discussion of
normalization as part of Selectors 3. I think the conclusion was that
it's not a problem in practice.  I think the same is true here.

I should also point out that this is already an issue with the way CSS3
Text defines the text-transform property itself, there's no description
of whether normalization should occur in the presence of combining
characters or not.  My guess is that all user agents today only
transform base characters without doing any normalization, such that
<base> + <combining> simply becomes T<base> + <combining>.

I think a very simple version of @text-transform is possible to define
in the CSS3 Text timeframe.  But we won't know unless we try.  A simple
one-to-one character mapping is the way to go, as Murata-san suggests.
It wouldn't be a terrible thing to simply say that at this level,
transforms defined with @text-transform are only defined for the base
characters within clusters and no normalization is assumed or required,
just as user agents do already for the other predefined transforms.  We
can deal with support for more complex situations at a later time, based
partly on whether or not there are real issues in practice.

Regards,

John Daggett

Received on Thursday, 15 December 2011 04:57:37 UTC