W3C home > Mailing lists > Public > public-i18n-core@w3.org > January to March 2008

[UAX29] i18n comment 1: Grapheme terminology

From: <ishida@w3.org>
Date: Fri, 07 Mar 2008 11:27:58 +0000
To: public-i18n-core@w3.org
Message-Id: <20080307112431.7F0CA4F5D5@homer.w3.org>

Comment from the i18n review of:
http://www.unicode.org/reports/tr29/tr29-12.html

Comment 1
At http://www.w3.org/International/reviews/0801-uax29/
Editorial/substantive: E
Tracked by: RI

Location in reviewed document:
3 [http://www.unicode.org/reports/tr29/tr29-12.html#Grapheme_Cluster_Boundaries]

Comment: 
"To avoid ambiguity with the computer use of the term character, this is called a user-perceived character or a grapheme cluster.".

 
Section 1 para 1 replaces 'grapheme clusters ("user-perceived characters")' with 'user-perceived characters', but should probably say 'grapheme clusters (also known as user-perceived characters)'.

 
S1 para 4 replaces 'grapheme clusters (what end users usually think of as characters)' with just 'characters'. This is incorrect. 

 
S2 para1 deletes 'grapheme clusters' and leaves 'user-perceived characters'.

 
Later we read:

 
"Note: Default grapheme clusters have been referred to as" 

 
This could point to a problem with terminology. Is 'default grapheme clusters' meant to include default grapheme clusters of the extended and existing types? I would have thought so, but the meaning of the text is not clear. You'd need to say 'default grapheme clusters and extended default grapheme clusters' here to be clear (and elsewhere in the text, eg. 4 paras later). We could rename the current 'default grapheme cluster' to 'minimal default grapheme cluster' and define 'default grapheme cluster' to refer to both the minimal and extended varieties, or you could simply use 'grapheme cluster' when you want to be non-specific.

 
This is very inconsistent.

 
We would like to see some rationalization of the terminology used throughout the section, and consistency in its application.

 
Terms should be clearly defined, and only one term should be used for one concept. The definitions should be easy for the reader to locate visually, and compare. We suggest a mini-glossary internal to section 3 or links on terms to a glossary at the end of the document. 

 
In particular, the replacement of the term "grapheme cluster" with term "character", starting in the introduction and proceeding through the document, seems to fly in the face of standard Unicode terminology and produces a significant problem. The term "character", as usually understood in Unicode contexts, refers to a logical character i.e. a code point. By using the term interchangeably with "grapheme cluster", we introduce confusion.

 
Received on Friday, 7 March 2008 11:24:39 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 1 October 2008 10:18:53 GMT