W3C home > Mailing lists > Public > whatwg@whatwg.org > August 2013

Re: [whatwg] Can the maximum allowed value length be changed to restrict the number of characters?

From: Jukka K. Korpela <jkorpela@cs.tut.fi>
Date: Tue, 20 Aug 2013 16:49:16 +0300
Message-ID: <521373DC.5000508@cs.tut.fi>
To: whatwg@lists.whatwg.org
2013-08-20 2:40, Ryosuke Niwa wrote:

>> http://www.whatwg.org/specs/web-apps/current-work/multipage/association-of-controls-and-forms.html#maximum-allowed-value-length
>>
 >> Why is the maxlength attribute of the input element specified to
 >> restrict the length of the value by the code-unit length?

Apparently because in the DOM, "character" effectively means "code 
unit". In particular, the .value.length property gives the length in 
code units.

>> This is counter intuitive for users and authors who typically
>> intend to restrict the length by the number of composed character
>> sequences.

That is true. We should not expect end users to know whether a character 
they enter occupies one code unit or two, i.e. whether it is a BMP 
character or not. Then again, I don't expect most users to enter non-BMP 
characters, though this might be changing as e.g. emoticons become more 
popular.

>> In fact, this is the current shipping behavior of
>> Safari and Chrome.

And IE, but not Firefox. Here's a simple test:

<input maxlength=2 value="&#x10400;">

On Firefox, you cannot add a character to the value, since the length is 
already 2. On Chrome and IE, you can add even a second non-BMP 
character, even though the length then becomes 4. I don't see this as 
particularly logical, though I'm looking this from the programming point 
of view, not end user view.

>> Can the specification be changed to use the number of composed
>> character sequences instead of the code-unit length?

In contexts where you want to set maxlength in the first place, your 
reasons might well be related to limitations that apply to the code unit 
length. It's a different thing if the intent is to limit the amount of 
visible characters.

Interestingly, an attempt like
<input pattern=.{0,42}>
to limit the amount of *characters* to at most 42 seems to fail. 
(Browsers won't prevent from typing more, but the control starts 
matching the :invalid selector if you enter characters that correspond 
to more than 42 code units.) The reason is apparently that "." means 
"any character" in the sense "any code point", counting a non-BMP 
character as two.

> Also,
> http://www.whatwg.org/specs/web-apps/current-work/multipage/common-input-element-attributes.html#the-maxlength-attribute
> says "if the input element has a maximum allowed value length, then
> the code-unit length of the value of the element's value attribute
> must be equal to or less than the element's maximum allowed value
> length."
>
> This doesn't seem to match the behaviors of existing Web browsers or
> http://www.whatwg.org/specs/web-apps/current-work/multipage/association-of-controls-and-forms.html#maximum-allowed-value-length
> unless I'm misreading something.  Namely, the value attribute set in
> the markup or by script isn't automatically truncated at the
> element's maximum allowed value length.

There seems to be a conflict here indeed. It is different from the 
character vs. code unit issue, however.

Definitions in 4.10.21.1 clearly imply that the length of the value of a 
control may exceed the limit set by maxlength. The "Constraints" part 
deals with the question what happens then (in form submission).

Yucca
Received on Tuesday, 20 August 2013 13:49:48 UTC

This archive was generated by hypermail 2.3.1 : Monday, 13 April 2015 23:09:23 UTC