W3C home > Mailing lists > Public > whatwg@whatwg.org > August 2013

[whatwg] maxlength="" feedback

From: Ian Hickson <ian@hixie.ch>
Date: Thu, 29 Aug 2013 19:26:24 +0000 (UTC)
To: WHAT Working Group <whatwg@lists.whatwg.org>
Message-ID: <alpine.DEB.2.00.1308291813240.27209@ps20323.dreamhostps.com>
On Fri, 28 Jun 2013, Steve Hoeksema wrote:
> 
> The current whatwg standard [1] states that maxlength is not a valid 
> attribute for input[type=number].
> 
> I built a form and tested it in Firefox, which honours the maxlength 
> attribute, and then found that Chrome did not.
> 
> I thought this was a bug, so I reported it to Chromium [2], who 
> determined it was not a bug and referred me to whatwg.
> 
> I'm wondering if there is a rationale for not supporting maxlength on a 
> number field, and if not, how I can go about having the standard 
> changed?

Just set the max="" attribute instead.


On Fri, 28 Jun 2013, Steve Hoeksema wrote:
>
> In my specific case, a numeric code with a maximum length.
> 
> Say it's 4 digits, and I'm using Chrome. I can put max=9999, but the 
> browser still allows me to type 12345. It won't allow me to submit the 
> form, and it highlights it as an error, but I can still enter it. Using 
> a maxlength means I can't even enter 12345, and it's obvious that it 
> will only accept 4 digits.

If you have a numeric code (i.e. "0000" is different than "0") then 
type=number is the wrong type; you should instead use:

   <input type=text pattern="[0-9]{4}" maxlength=4 inputmode=numeric>


> Using input[type=text] is not desirable because (e.g.) it pops up a 
> alphabetical keyboard on iOS instead of a numeric keyboard.

That's fixed by inputmode=numeric.


On Fri, 28 Jun 2013, Jukka K. Korpela wrote:
> 
> People want to [specify maxlength on type=number] to cover old browsers 
> that do not support type=number. Such browsers ignore both the type 
> attribute and the max attribute, so to impose *some* limits, people 
> would use maxlength.

That will work, yes. It's not conforming, because that way validators will 
warn you that what you're doing isn't going to work going forward.

Generally speaking, though, even with legacy browsers, if you're asking 
for a _number_ then maxlength="" isn't useful. Supposed you want a number 
between 0 and 9999. Well, "0" and "00000.0000000000" are the same number, 
even though one is longer than 4 digits.


On Mon, 19 Aug 2013, Ryosuke Niwa wrote:
> 
> Why is the maxlength attribute of the input element specified to 
> restrict the length of the value by the code-unit length?

That's either what most browsers seemed to do when I tested it, or it was 
the most consistent thing to specify based on other things that were 
consistently implemented (e.g. the ".textLength" attribute's behaviour).


> This is counter intuitive for users and authors who typically intend to 
> restrict the length by the number of composed character sequences.

There's actually a number of possible things people might intuitively 
expect it to do -- count graphemes, count Unicode code points, count 
composed characters, count monospace width, count bytes, etc. It's not 
clear to me that there's one answer, nor that, in fact, most authors have 
any idea that there are so many answers to the question "how long is my 
string".


On Mon, 19 Aug 2013, Ryosuke Niwa wrote:
> 
> Also, 
> http://www.whatwg.org/specs/web-apps/current-work/multipage/common-input-element-attributes.html#the-maxlength-attribute 
> says "if the input element has a maximum allowed value length, then the 
> code-unit length of the value of the element's value attribute must be 
> equal to or less than the element's maximum allowed value length."
> 
> This doesn't seem to match the behaviors of existing Web browsers

That's authoring conformance criteria, not implementation conformance 
criteria.


On Tue, 20 Aug 2013, Jukka K. Korpela wrote:
> 
> Apparently because in the DOM, "character" effectively means "code 
> unit". In particular, the .value.length property gives the length in 
> code units.

Specifically, UTF-16 code units.


> > > In fact, this is the current shipping behavior of Safari and Chrome.
> 
> And IE, but not Firefox. Here's a simple test:
> 
> <input maxlength=2 value="&#x10400;">
> 
> On Firefox, you cannot add a character to the value, since the length is 
> already 2. On Chrome and IE, you can add even a second non-BMP 
> character, even though the length then becomes 4. I don't see this as 
> particularly logical, though I'm looking this from the programming point 
> of view, not end user view.

Which version of IE? I wonder if this changed at some point.


> Interestingly, an attempt like <input pattern=.{0,42}> to limit the 
> amount of *characters* to at most 42 seems to fail. (Browsers won't 
> prevent from typing more, but the control starts matching the :invalid 
> selector if you enter characters that correspond to more than 42 code 
> units.) The reason is apparently that "." means "any character" in the 
> sense "any code point", counting a non-BMP character as two.

This is inherited from JavaScript.


On Thu, 22 Aug 2013, Charles McCathie Nevile wrote:
> > 
> > The basic question is whether a validator should flag <input 
> > maxlength="2" value="abc"> as a conformance error or not.  It seems to 
> > me like it should.
> 
> Why? It seems that it generally works in browsers, and has for a long 
> time.

Because it's a likely authoring mistake.

On Thu, 22 Aug 2013, Boris Zbarsky wrote:
> 
> Sort of.  It gets you in a state where the user can erase the "c" but 
> not retype it (though the erasing edit can be undone via the editor's 
> "undo" functionality, apparently)....

Right.

On Tue, 20 Aug 2013, Anne van Kesteren wrote:
>
> I don't think there's any place in the platform where we measure string 
> length other than by number of code units at the moment.

On Tue, 20 Aug 2013, Jukka K. Korpela wrote:
> 
> Oh, right, this is an issue different from the non-BMP issue I discussed 
> in my reply. This is even clearer in my opinion, since U+0041 U+030A is 
> clearly two Unicode characters, not one, even though it is expected to 
> be rendered as “Å” and even though U+00C5 is canonically equivalent to 
> U+0041 U+030A.

"Clearly" is not the word I would use, at least from the user's 
perspective. Why is pressing alt+a on the keyboard two characters? Or is 
it one? How can you tell, as a user? Why is it different than alt+i i? Or 
is it not? How can you tell, as a user?

(It's equally unclear if you use UTF-16 code units, of course.)


On Wed, 21 Aug 2013, Alexey Proskuryakov wrote:
> 
> I agree with Darin's comment in that the standard should consider end 
> user concepts more strongly here. WebKit had this more humane behavior 
> for many years, so we know that it's compatible with the Web, and there 
> is no need to chase the lowest common denominator.
> 
> Additionally, there are features in the platform that work with Unicode 
> grapheme clusters perfectly, and I think that these are closely 
> connected to maxLength. Namely, editing functionality understands 
> grapheme clusters very well, so you can change selections by moving 
> caret right or left one "character", and so forth. Web sites frequently 
> perform some editing on the text as you type it.

[...]

These arguments are pretty strong, I think.


On Mon, 19 Aug 2013, Ryosuke Niwa wrote:
>
> Can the specification be changed to use the number of composed character 
> sequences instead of the code-unit length?

Fundamentally I don't think there's much to argue for one way or the other 
here. Every answer is unintuitive or bad to someone.

There's a consistency argument -- everything in HTML and JS operates on 
UTF-16 code units, so this should too -- but it's not very strong.

>From testing, it seems to me that browsers vary in what they do. For 
example, I can enter "üü" in Chrome and Firefox (U+00FC), but "oᷘoᷘ" in 
Chrome and only "oᷘ" in Firefox ("o" with U+1DD8). Similarly, I can enter 
"𐄪𐄪" in Chrome but only "𐄪" in Firefox (U+1012A). However, I can only 
enter one "क्षि" in Chrome, and can only enter half of it in Firefox 
(U+0915 U+094D U+0937 U+093F, Devanagari kshi), and I can enter any number 
of "ििििििििि"s in Chrome, but only two in Firefox (U+093Fs). And could 
only enter "षष" in Chrome, not any number of those (U+0937). And weirdly, 
while I can paste each part of "🇨🇭" as a separate character, I can only 
backspace it as one, but it counts as two for maxlength="" purposes 
(U+1F1E8 U+1F1ED). Safari seems to behave the same as Chrome, 
unsurprisingly. I couldn't test IE today.

I can't work out what this means for WebKit and Blink -- what are they 
doing? Some sort of old grapheme cluster definition that isn't quite the 
legacy grapheme cluster definition? Why doesn't it match what the cursor 
code is doing? If we're going to go with a user-friendly definition, it 
seems that matching user behaviour makes the most sense.

The current definition (UTF-16 code units) has the benefit of being very 
clear, if odd from any other perspective.

I don't mind using some other definition, if the browsers are going to 
implement it. But it's not clear to me what the definition should be. What 
Chrome and Safari are doing today isn't a sane answer, IMHO.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
Received on Thursday, 29 August 2013 19:26:56 UTC

This archive was generated by hypermail 2.3.1 : Monday, 13 April 2015 23:09:23 UTC