[whatwg] input element's value should not be sanitized during parsing

(Sorry to bring back an old thread. Trying to catch up on old to-do's
now that FF4 is almost out the door)

On Tue, Dec 28, 2010 at 11:46 PM, Ian Hickson <ian at hixie.ch> wrote:
> On Mon, 20 Sep 2010, Mounir Lamouri wrote:
>>
>> With the current specification, these two elements will not have the
>> same value:
>> <input value="foo&#13;bar" type='hidden'>
>> <input type='hidden' value="foo&#13;bar">
>
> Yes they will. The attribute order has no effect. Elements are created
> by the parser with their attributes already set:
>
> # When the steps below require the UA to create an element for a token in
> # a particular namespace, the UA must create a node implementing the interface
> # appropriate for the element type corresponding to the tag name of the
> # token in the given namespace (as given in the specification that defines
> # that element, e.g. for an a element in the HTML namespace, this
> # specification defines it to be the HTMLAnchorElement interface), with
> # the tag name being the name of that element, with the node being in the
> # given namespace, and with the attributes on the node being those given
> # in the given token.
> ?-- http://www.whatwg.org/specs/web-apps/current-work/complete.html#create-an-element-for-the-token

Except that I don't think this is how any implementation actually
works. Nor do I have any desire to write the implementation this way
since it means duplicating a lot of code. I'd have to add code which
implemented attribute behavior both in some special code path
triggered during element creation, as well as code to react to
attribute changes triggered by attribute changes in
setAttribute/removeAttribute.

So far this hasn't been needed and the parsing code basically just
calls setAttribute. Unless there are really good reasons to change
this I'd like to avoid it. So far I haven't heard of any such reasons.

> On Tue, 21 Sep 2010, Boris Zbarsky wrote:
>>
>> Where does it say that it's atomic? ?I don't see that anywhere (and in
>> fact, the "create an element" code in the Gecko parser is most decidedly
>> non-atomic). ?Now maybe the spec intends this to be an atomic operation;
>> if so it needs to say that.
>
> The operation it describes is a single operation: create a node. It
> describes various constraints on that operation, one of which is that the
> node have the various tokenised attributes set. I don't understand how
> creating a node could be anything other than atomic -- either it exists or
> it does not.

You're expecting several operations to happen at the same time. We
could certainly manually insert the attributes and their value into
the datastructure inside the element which stores the attribute
name/value pairs. However at some point we need to update all of the
state that these values drive. Things like sticking elements into
id-hashes, storing the calculated type of an input, calculating the
effective URI of an image, etc. This involves several separate pieces
of state and so can't happen "all at the same time".

> On Tue, 21 Sep 2010, Boris Zbarsky wrote:
>>
>> That doesn't work if your parser and DOM aren't very very _very_ tightly
>> coupled, since there are no DOM APIs to "atomically" set a bunch of
>> attributes.
>
> The HTML spec in general assumes that the implementation of the parser is
> the implementation of the DOM and that you wouldn't use the DOM Core API
> to implement the DOM or the parser.

I wouldn't build a parser on the raw DOM API either. But mostly for
performance reasons since we have to do a lot more checks on data that
comes from untrusted script (things like prevent ancestor cycles etc).
But I'd also strongly want to share most of the code path between the
API that the DOM uses and that the parser uses. Not doing that is
going to lead to a lot more bloat and a lot more bugs.

> On Tue, 21 Sep 2010, Jonas Sicking wrote:
>>
>> Also, it would mean that the following two pieces of code behaves differently:
>>
>> inp = document.createElement("input");
>> inp.setAttribute("value", "foo\nbar");
>> inp.setAttribute("type", "hidden");
>>
>> and
>>
>> inp = document.createElement("input");
>> inp.setAttribute("type", "hidden");
>> inp.setAttribute("value", "foo\nbar");
>>
>> This does not seem desirable.
>
> I can't argue that it's desireable, but it's how the Web works, as I
> understand it.

Gecko doesn't exhibit this behavior and I don't know of any sites that
doesn't work in Gecko because of this.

/ Jonas

Received on Friday, 11 March 2011 15:56:54 UTC