- From: Richard Ishida <ishida@w3.org>
- Date: Fri, 7 Mar 2008 16:40:17 -0000
- To: "'Mark Davis'" <mark.davis@icu-project.org>
- Cc: <public-i18n-core@w3.org>
Here are some concrete proposals for text change (most just copied from below): a. Last sentence in para 4 of section 3.0: clusters -> cluster b. section 1 para 4 should say "…significant boundaries in text: user-perceived characters, words, …" c. Section 3 para 6, first sentence: I suggest "These algorithms can be adapted to produce *tailored grapheme clusters* for specific locales or other customizations, such as the contractions used in collation tailoring tables. Below are some examples of the differences between these concepts." d. I would suggest that the para that begins "Grapheme clusters can be tailored to meet further requirements." could be changed to mirror earlier text with "A *tailored grapheme cluster* uses customizations of the Unicode rules to meet further requirements." RI ============ Richard Ishida Internationalization Lead W3C (World Wide Web Consortium) http://www.w3.org/International/ http://rishida.net/blog/ http://rishida.net/ > -----Original Message----- > From: public-i18n-core-request@w3.org [mailto:public-i18n-core- > request@w3.org] On Behalf Of Richard Ishida > Sent: 07 March 2008 14:13 > To: public-i18n-core@w3.org > Subject: RE: [UAX29] i18n comment 1: Grapheme terminology > > > New text is MUCH much better! Eliminated default as part of a name, > highlighted the terms, use Grapheme Cluster for the general case, and > Extended Grapheme Cluster and Legacy Grapheme Cluster for the subtypes, > and > used general term appropriately, not as short form. User-perceived > character used consistently and defined clearly as a separate thing from a > grapheme cluster. > > Last sentence in para 4 of section 3.0: clusters -> cluster > > I think section 1 para 4 should say "…significant boundaries in text: > user-perceived characters, words, …" > > Is it worth saying, in the initial setup, that there are *3* types of > grapheme cluster: legacy GC, extended GC, and tailored GC ? Since that's > really the division. This may be a slightly different way of seeing the > world compared to that in the note near the end of 3.0, but I think it > makes > sense. In fact, it has already been done in table 1a. > > I would suggest that the para that begins "Grapheme clusters can be > tailored > to meet further requirements." could be changed to mirror earlier text > with > "A *tailored grapheme cluster* uses customizations of the Unicode rules to > meet further requirements." > > RI > > > > ============ > Richard Ishida > Internationalization Lead > W3C (World Wide Web Consortium) > > http://www.w3.org/International/ > http://rishida.net/blog/ > http://rishida.net/ > > > > > -----Original Message----- > > From: public-i18n-core-request@w3.org [mailto:public-i18n-core- > > request@w3.org] On Behalf Of ishida@w3.org > > Sent: 07 March 2008 11:28 > > To: public-i18n-core@w3.org > > Subject: [UAX29] i18n comment 1: Grapheme terminology > > > > > > Comment from the i18n review of: > > http://www.unicode.org/reports/tr29/tr29-12.html > > > > Comment 1 > > At http://www.w3.org/International/reviews/0801-uax29/ > > Editorial/substantive: E > > Tracked by: RI > > > > Location in reviewed document: > > 3 [http://www.unicode.org/reports/tr29/tr29- > > 12.html#Grapheme_Cluster_Boundaries] > > > > Comment: > > "To avoid ambiguity with the computer use of the term character, this is > > called a user-perceived character or a grapheme cluster.". > > > > > > Section 1 para 1 replaces 'grapheme clusters ("user-perceived > > characters")' with 'user-perceived characters', but should probably say > > 'grapheme clusters (also known as user-perceived characters)'. > > > > > > S1 para 4 replaces 'grapheme clusters (what end users usually think of > as > > characters)' with just 'characters'. This is incorrect. > > > > > > S2 para1 deletes 'grapheme clusters' and leaves 'user-perceived > > characters'. > > > > > > Later we read: > > > > > > "Note: Default grapheme clusters have been referred to as" > > > > > > This could point to a problem with terminology. Is 'default grapheme > > clusters' meant to include default grapheme clusters of the extended and > > existing types? I would have thought so, but the meaning of the text is > > not clear. You'd need to say 'default grapheme clusters and extended > > default grapheme clusters' here to be clear (and elsewhere in the text, > eg. > > 4 paras later). We could rename the current 'default grapheme cluster' > to > > 'minimal default grapheme cluster' and define 'default grapheme cluster' > > to refer to both the minimal and extended varieties, or you could simply > > use 'grapheme cluster' when you want to be non-specific. > > > > > > This is very inconsistent. > > > > > > We would like to see some rationalization of the terminology used > > throughout the section, and consistency in its application. > > > > > > Terms should be clearly defined, and only one term should be used for > one > > concept. The definitions should be easy for the reader to locate > visually, > > and compare. We suggest a mini-glossary internal to section 3 or links > on > > terms to a glossary at the end of the document. > > > > > > In particular, the replacement of the term "grapheme cluster" with term > > "character", starting in the introduction and proceeding through the > > document, seems to fly in the face of standard Unicode terminology and > > produces a significant problem. The term "character", as usually > > understood in Unicode contexts, refers to a logical character i.e. a > code > > point. By using the term interchangeably with "grapheme cluster", we > > introduce confusion. > > > > >
Received on Friday, 7 March 2008 16:37:06 UTC