Re: New full Unicode for ES6 idea

On Sat, Feb 25, 2012 at 9:52 AM, Boris Zbarsky <bzbarsky@mit.edu> wrote:

> On 2/25/12 11:19 AM, Glenn Adams wrote:
>
>> To answer Anne, I concur that Unicode scalar values (also known as
>> Unicode code points) as opposed to encoded coding elements, i.e., code
>> units, e.g., 16-bit units of UTF-16, are the correct choice. Grapheme
>> clusters remain in the text processing (i.e., abstract character)
>> domain, and not the encoded character domain.
>>
>
> I believe Anne's point is that we are in fact talking about text
> processing here, throughout this discussion, so grapheme clusters seem like
> the right thing to be talking about...


My apologies for not having followed this long thread (just joined this ML
in fact), but I did read the original posting [1], and it appears to be
related to a simple idea: to transition from the use of 16-bit encoding
units to unicode scalar values as the access units for ES strings.

[1]
http://lists.w3.org/Archives/Public/public-script-coord/2012JanMar/0194.html

On its own, I support such a transition. However, I believe it would be
unwise to introduce graphemes or grapheme clusters into this transition.

The motivation for making a transition can simply be stated as a desire to
easily support all Unicode abstract characters in a simple string construct
without having to deal with surrogate pairs.

Of course, a secondary motivation is to simplify (the domain and range of)
text processing functions, but that should be a second order determiner,
and I would suggest that introducing grapheme clusters (which certainly do
have a role at the text processing layer) should best be avoided in
characterizing this possible transition.

G.

Received on Saturday, 25 February 2012 17:26:59 UTC