- From: Florian Rivoal <florianr@opera.com>
- Date: Wed, 14 Dec 2011 15:27:27 +0100
- To: www-style@w3.org
On Wed, 14 Dec 2011 14:48:28 +0100, MURATA Makoto <eb2m-mrt@asahi-net.or.jp> wrote: > If grapheme clusters, word boundaries, and Unicode normalizations are > incorporated, the result will be very complicated. The idea of using grapheme clusters is to make it magically do the right thing for authors. The word would scare authors away, but the behavior would just make authors' intuitive understanding of what a character is match with what the transform considers a character. As for how word boundaries and unicode transformations, I am not quite sure how you can be convinced they would make the feature hard to use before we even decide on what they are supposed to do and how they should work. > Note that Unicode > regular expressions Level 1 (Unicode Technical Standard #18) > significantly simplifies grapheme clusters and word boundaries. Thanks for the link, I'll read up on that. > The smallest generic solution is one-to-one mapping of UCS code values. > I would be a small subset of your "convert". I think that it would be > very appropriate as Level 1 of text transformation. Operating on single unicode codepoints isn't s simpler subset of operating on grapheme cluster, but rather an incompatible variant. Maybe using grapheme clusters here is wrong, and we should go for single code points, but we should not rush into using one definition of character while suspecting we'll eventually want to use another one, as that would break content built using the earlier definition. - Florian
Received on Wednesday, 14 December 2011 14:28:03 UTC