- From: Richard Ishida <ishida@w3.org>
- Date: Thu, 07 Aug 2014 14:35:47 +0100
- To: www-international@w3.org
- CC: "CSS WWW Style (www-style@w3.org)" <www-style@w3.org>
Thank you for your work on this. The i18n WG is now happy to close this issue. RI >> On 24/01/2014 22:26, Phillips, Addison wrote: >>>> The definition of "grapheme cluster" in the Unicode Glossary defers >>>> to UAX 29, >>>> but the current revision (23) of that UAX doesn't actually have a >>>> formal >>>> definition of "grapheme cluster", except as a cover term for default >>>> grapheme >>>> clusters, extended grapheme clusters, and tailored grapheme clusters, >>>> which >>>> *are* defined. >>>> >>>> It does, however, introduce the informal term "user-perceived >>>> character", and >>>> says that grapheme clusters (by implication, of one of the above >>>> varieties) are an approximation to user-perceived characters. >>> >>> The specific quote I think you refer to is: >>> >>> -- >>> It is important to recognize that what the user thinks of as a >>> "character"—a basic unit of a writing system for a language—may not be >>> just a single Unicode code point. Instead, that basic unit may be made >>> up of multiple Unicode code points. To avoid ambiguity with the >>> computer use of the term character, this is called a user-perceived >>> character. For example, “G” + acute-accent is a user-perceived >>> character: users think of it as a single character, yet is actually >>> represented by two Unicode code points. These user-perceived >>> characters are approximated by what is called a grapheme cluster, >>> which can be determined programmatically. >>> -- >>> >>>> >>>> This seems to me like good terminology to follow. >>>> >>> >>> The challenge here is that Unicode (and CSS) both define the term >>> "character" to have a specific meaning equivalent to a Unicode >>> codepoint, i.e. the "computer use" of the term. CSS3 Text, however, >>> attempts to redefine and then use the term "character" to also mean a >>> "user-perceived character". The use of the word "character" after that >>> point is somewhat haphazard, leading to a number of problems in >>> understanding the spec. Our primary comment is that we'd prefer to see >>> a term other than (unadorned) "character" used where "user-perceived >>> character" is intended. >>> >>> I agree that we could use "user-perceived character" instead of >>> "grapheme cluster". My reservation about that is that a "grapheme >>> cluster" (of various flavors and stripes) can be "determined >>> programmatically", which is a consideration for implementation. If the >>> "user-perceived character" cannot be determined programmatically, it >>> is not possible to do much with it in terms of CSS. Hence, I think >>> using the [whatever] "grapheme cluster" terminology is useful here >>> because that is the unit that CSS will actually operate on in the >>> cases where "user-perceived character" is intended. >>> >>> The ending part of my comment (which grew out of WG discussion): >>> >>>> ... Rather, we should say that applications sometimes require >>>> additional >>>> rules beyond the use of 'grapheme clusters' in order to handle >>>> the typographic traditions of particular scripts. >>> >>> ... suggests that some scripts require "tailored grapheme clusters" >>> (we're aware of claims of Indic script or language requirements in >>> this regard) but for which there is no fully-defined tailoring to >>> point to. >>> >>> HTH, >>> >>> Addison >>> >>> >> >> > >
Received on Thursday, 7 August 2014 13:36:17 UTC