- From: Glenn Adams <glenn@skynav.com>
- Date: Sat, 25 Feb 2012 10:26:11 -0700
- To: Boris Zbarsky <bzbarsky@mit.edu>
- Cc: public-script-coord@w3.org
- Message-ID: <CACQ=j+duYnipkROuBXoiReiojwOL9w3mK0_xjruc27F+uT7xoA@mail.gmail.com>
On Sat, Feb 25, 2012 at 9:52 AM, Boris Zbarsky <bzbarsky@mit.edu> wrote: > On 2/25/12 11:19 AM, Glenn Adams wrote: > >> To answer Anne, I concur that Unicode scalar values (also known as >> Unicode code points) as opposed to encoded coding elements, i.e., code >> units, e.g., 16-bit units of UTF-16, are the correct choice. Grapheme >> clusters remain in the text processing (i.e., abstract character) >> domain, and not the encoded character domain. >> > > I believe Anne's point is that we are in fact talking about text > processing here, throughout this discussion, so grapheme clusters seem like > the right thing to be talking about... My apologies for not having followed this long thread (just joined this ML in fact), but I did read the original posting [1], and it appears to be related to a simple idea: to transition from the use of 16-bit encoding units to unicode scalar values as the access units for ES strings. [1] http://lists.w3.org/Archives/Public/public-script-coord/2012JanMar/0194.html On its own, I support such a transition. However, I believe it would be unwise to introduce graphemes or grapheme clusters into this transition. The motivation for making a transition can simply be stated as a desire to easily support all Unicode abstract characters in a simple string construct without having to deal with surrogate pairs. Of course, a secondary motivation is to simplify (the domain and range of) text processing functions, but that should be a second order determiner, and I would suggest that introducing grapheme clusters (which certainly do have a role at the text processing layer) should best be avoided in characterizing this possible transition. G.
Received on Saturday, 25 February 2012 17:26:59 UTC