[whatwg] textarea newline format - raw value vs. transformed value and setSelectionRange

On Sun, 10 Oct 2010, Michael A. Puls II wrote:
>
> Consider the following [simplified]:
> 
> <!DOCTYPE html>
> <title></title>
> <script>
>   window.addEventListener("DOMContentLoaded", function() {
>       var ta = document.getElementsByTagName("textarea")[0];
>       ta.value = ta.value.replace(/\r|\n/g, encodeURIComponent);
>   }, false);
> </script>
> <textarea rows="3">Line 1
> Line 2
> Line 3</textarea>
> 
> The behavior between Firefox 4 latest trunk and Opera 10.70 latest 
> snapshot is different because they're using different newline formats.

The correct behaviour is that the element's value becomes
   "Line 1%0ALine 2%0ALine 3"


> See step 1 at
> <http://www.whatwg.org/specs/web-apps/current-work/multipage/the-button-element.html#attr-textarea-wrap-hard-state>.
> 
> That says that the 'value' getter returns the raw value + newlines normalized
> to "\r\n".

No, it says that the submission value has that transformation applied. The 
'.value' getter returns the _raw_ value, which doesn't have U+000Ds added 
by the user agent (they can only be there if the script added them).


> I always thought that meant that the raw value (what was parsed into the 
> DOM)

The "raw value" is what the user edits.


> contained newlines normalized to "\r\n" too for textareas and that a 
> browser with an HTML5 parser like Firefox would automatically show 
> newlines normalized to "\r\n" without even having a conversion done 
> (internally) for the 'value' getter.

No, the HTML parser strips U+000D characters ("\r").


> I'm also not sure step 1 applies to the 'value' setter. I can't tell for 
> sure. It looks like not, but not sure.

It doesn't apply to .value at all, only to the 'value' concept, which is a 
concept used in form submission and constraint validation.


> Also, I'm not sure if setSelectionRange() should operate on the raw 
> value, or the transformed value in step 1.

Raw value, because <textarea> is defined as an element that "represents a 
multiline plain text edit control for the element's raw value".


> Opera's not using an HTML5 parser yet, so I can't check what it might 
> do, but could this be clarified?

It's not clear to me what isn't clear. :-) Could you elaborate on what the 
spec says that led you to your interpretation?


> In 
> <http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html#preprocessing-the-input-stream> 
> it says:
> 
> "U+000D CARRIAGE RETURN (CR) characters and U+000A LINE FEED (LF) 
> characters are treated specially. Any CR characters that are followed by 
> LF characters must be removed, and any CR characters not followed by LF 
> characters must be converted to LF characters. Thus, newlines in HTML 
> DOMs are represented by LF characters, and there are never any CR 
> characters in the input to the tokenization stage."
> 
> Does that mean that the raw value of the parsed textarea should only 
> ever have '\n' for newlines (unless the 'value' setter is used in JS to 
> introduce '\r' characters)?

Yes.


> If so, does that mean that setSelectionRange() should operate on the 
> raw, internal value (that just has '\n' for newlines in it normally), 
> but the 'value' getter still returns the transformed value with newlines 
> normalized to "\r\n"?

The value getting doesn't return the transformed value. See the definition 
of the value getting for details.


> I see 
> <http://www.whatwg.org/specs/web-apps/current-work/multipage/editing.html#dom-textarea/input-setselectionrange>, 
> but it doesn't mention this.

I've clarified the spec to indicate that setSelectionRange() and company 
operate on the raw value.

Cheers,
-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Received on Tuesday, 4 January 2011 16:38:17 UTC