Re: [whatwg] Can the maximum allowed value length be changed to restrict the number of characters? from Jukka K. Korpela on 2013-08-20 (public-whatwg-archive@w3.org from August 2013)

From: Jukka K. Korpela <jkorpela@cs.tut.fi>
Date: Tue, 20 Aug 2013 17:21:35 +0300
To: whatwg@lists.whatwg.org
Message-ID: <52137B6F.8050306@cs.tut.fi>

2013-08-20 17:09, Anne van Kesteren wrote:

> On Tue, Aug 20, 2013 at 12:30 AM, Ryosuke Niwa <rniwa@apple.com> wrote:
>> Can the specification be changed to use the number of composed character sequences instead of the code-unit length?
>
> In a way I guess that's nice, but it also seems confusing that given
>
> data:text/html,<input type=text maxlength=1>
>
> pasting in U+0041 U+030A would give a string that's longer than 1 from
> JavaScript's perspective.

Oh, right, this is an issue different from the non-BMP issue I discussed 
in my reply. This is even clearer in my opinion, since U+0041 U+030A is 
clearly two Unicode characters, not one, even though it is expected to 
be rendered as “Å” and even though U+00C5 is canonically equivalent to 
U+0041 U+030A.

> I don't think there's any place in the
> platform where we measure string length other than by number of code
> units at the moment.

Besides, if “character” means something else than Unicode character 
(Unicode code point assigned to a character) or, as a different concept, 
Unicode code unit, then the question would arise what it means. For 
example, would a letter followed by 42 combining marks still be one 
character? (Such monstrosities are actually used, in an attempt to 
create “funny” effects.)

Yucca

Received on Tuesday, 20 August 2013 14:22:01 UTC