- From: Jukka K. Korpela <jkorpela@cs.tut.fi>
- Date: Tue, 20 Aug 2013 16:49:16 +0300
- To: whatwg@lists.whatwg.org
2013-08-20 2:40, Ryosuke Niwa wrote: >> http://www.whatwg.org/specs/web-apps/current-work/multipage/association-of-controls-and-forms.html#maximum-allowed-value-length >> >> Why is the maxlength attribute of the input element specified to >> restrict the length of the value by the code-unit length? Apparently because in the DOM, "character" effectively means "code unit". In particular, the .value.length property gives the length in code units. >> This is counter intuitive for users and authors who typically >> intend to restrict the length by the number of composed character >> sequences. That is true. We should not expect end users to know whether a character they enter occupies one code unit or two, i.e. whether it is a BMP character or not. Then again, I don't expect most users to enter non-BMP characters, though this might be changing as e.g. emoticons become more popular. >> In fact, this is the current shipping behavior of >> Safari and Chrome. And IE, but not Firefox. Here's a simple test: <input maxlength=2 value="𐐀"> On Firefox, you cannot add a character to the value, since the length is already 2. On Chrome and IE, you can add even a second non-BMP character, even though the length then becomes 4. I don't see this as particularly logical, though I'm looking this from the programming point of view, not end user view. >> Can the specification be changed to use the number of composed >> character sequences instead of the code-unit length? In contexts where you want to set maxlength in the first place, your reasons might well be related to limitations that apply to the code unit length. It's a different thing if the intent is to limit the amount of visible characters. Interestingly, an attempt like <input pattern=.{0,42}> to limit the amount of *characters* to at most 42 seems to fail. (Browsers won't prevent from typing more, but the control starts matching the :invalid selector if you enter characters that correspond to more than 42 code units.) The reason is apparently that "." means "any character" in the sense "any code point", counting a non-BMP character as two. > Also, > http://www.whatwg.org/specs/web-apps/current-work/multipage/common-input-element-attributes.html#the-maxlength-attribute > says "if the input element has a maximum allowed value length, then > the code-unit length of the value of the element's value attribute > must be equal to or less than the element's maximum allowed value > length." > > This doesn't seem to match the behaviors of existing Web browsers or > http://www.whatwg.org/specs/web-apps/current-work/multipage/association-of-controls-and-forms.html#maximum-allowed-value-length > unless I'm misreading something. Namely, the value attribute set in > the markup or by script isn't automatically truncated at the > element's maximum allowed value length. There seems to be a conflict here indeed. It is different from the character vs. code unit issue, however. Definitions in 4.10.21.1 clearly imply that the length of the value of a control may exceed the limit set by maxlength. The "Constraints" part deals with the question what happens then (in form submission). Yucca
Received on Tuesday, 20 August 2013 13:49:48 UTC