Re: Proposal: Document.parse() [AKA: Implied Context Parsing] from Rafael Weinstein on 2012-06-04 (public-webapps@w3.org from April to June 2012)

From: Rafael Weinstein <rafaelw@google.com>
Date: Mon, 4 Jun 2012 16:21:57 -0700
To: Ian Hickson <ian@hixie.ch>
Cc: Webapps WG <public-webapps@w3.org>
Message-ID: <CABMdHiQVc51SHTwocKUa3Ldn_ZDoA7JhinKFOGRVCEUbvrCoyA@mail.gmail.com>
Just to be clear: what you are objecting to is the addition of formal
API for this.

You're generally supportive of adding a <template> element whose
contents would parse the way we're discussing here -- and given that,
a webdev could trivially polyfil Document.parse().

I.e. you're ok with the approach of the parser picking a context
element based on the contents of markup, but against giving webdevs
the impression that innerHTML is good practice, by adding more API in
that direction?

Put another way, though you're not happy with adding the API, you
willing to set that aside and help spec the parser changes required
for both this and <template> element (assuming the remaining issues
with <template> can be agreed upon)?

FWIW, I agree with Hixie in principle, but disagree in practice. I
think innerHTML is generally to be avoided, but I feel that adding
Document.parse() improves the situation by making some current uses
(which aren't likely to go away) less hacky. Also, I'm not as worried
with webdevs taking the wrong message from us adding API. My feeling
is that they just do what works best for them and don't think much
about what we are or are not encouraging.

Also, I'm highly supportive of the goal of allowing HTML literals in
script. I fully agree that better load ("compile") time feedback would
be beneficial to authors here.

On Mon, Jun 4, 2012 at 3:47 PM, Ian Hickson <ian@hixie.ch> wrote:
> On Fri, 25 May 2012, Rafael Weinstein wrote:
>>
>> Now's the time to raise objections to UA's adding support for this
>> feature.
>
> For the record, I very much object to Document.parse(). I think it's a
> terrible API. We should IMHO resolve the use case of "generate a DOM tree
> from script" using a much more robust solution that has compile-time
> syntax checking and so forth, rather than relying on the super-hacky
> "concatenate a bunch of strings and then parse them" solution that authors
> are forced to use today.
>
> innerHTML and document.write() are abominations unto computer science, and
> we are doing nobody any favours by continuing the platform down this road.
> They lead to programming styles that are rife with injection bugs (XSS),
> they are extremely difficult to debug and maintain, and they are terribly
> complicated to implement compared to more structured alternatives. The
> core reasons for these problems, IMHO, are two-fold:
>
>  1. Lack of compile-time syntax checking, which leads to typos not being
>    caught and thus programmer intent not being faithfully represented,
>    and
>  2. Putting markup syntax and data at the same level, instead of having
>    separating them as with other features in JS.
>
> For example, this kind of bug is easy to introduce and hard to spot or
> debug:
>
>   var heading = '<h1>Hello</h1>';
>   // ...
>   div.innerHTML = '<h1>' + heading + '</h1>';
>
> Even worse are things like typos:
>
>   tr.innerHTML = '<td>' + c1 + '</td><td>' + c2 + '</td><dt>' + c3 + '</td>;
>
> Compile-time syntax checking makes this a non-issue. Making data variables
> be qualitatively different than the syntax also solves problems, e.g.:
>
>   var title = "I hate </p> tags.";
>   // ...
>   div.innerHTML = '<p>Today's topic is: ' + title + '</p>'; // oops, not escaped
>
>
> There have been several alternative proposals; my personal favourite is
> Anne's E4H solution, basically E4X but simplified just for HTML, which
> I've written a strawman spec for here:
>
>   http://www.hixie.ch/specs/e4h/strawman
>
> I'm happy to write a more serious spec for this if this is something
> anyone is interested in implementing. The above examples become much
> easier to debug. The first one results in very ugly markup visible in the
> output of the page rather than in the weird spacing:
>
>   var heading = '<h1>Hello</h1>';
>   // ...
>   div.appendChild(<h1>{heading}</h1>);
>
> The second results in a compile-time syntax error so would be caught even
> before the code is reviewed:
>
>   tr.appendChild(<><td>{c1}</td><td>{c2}</td><dt>{c3}</td></>);
>
> The third becomes a non-issue because you don't need to escape text to
> avoid it from being mistaken for markup [1]:
>
>   var title = "I hate </p> tags.";
>   // ...
>   div.innerHTML = <p>Today's topic is: {title}</p>;
>
>
> Other proposed solutions include Element.create(), which is less verbose
> than the DOM but still more verbose than innerHTML or E4H; and
> quasistrings, which still suffer from lack of compile-time checking and
> mix markup with data, but at least would be more structured than raw
> strings and could offer better injection protection.
>
>
> [1] (This is not the same as auto-escaping strings in other contexts. For
> example, E4H doesn't propose to have CSS literals, so a string embedded in
> a style="" attribute wouldn't be automagically safe.)
>
> --
> Ian Hickson               U+1047E                )\._.,--....,'``.    fL
> http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
> Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
Received on Monday, 4 June 2012 23:22:27 UTC