- From: Adam Barth <w3c@adambarth.com>
- Date: Thu, 7 Mar 2013 17:37:07 -0800
- To: mikesamuel <mikesamuel@gmail.com>
- Cc: Brendan Eich <brendan@secure.meer.net>, Ian Hickson <ian@hixie.ch>, Rick Waldron <waldron.rick@gmail.com>, Ojan Vafai <ojan@chromium.org>, "rafaelw@chromium.org" <rafaelw@chromium.org>, Adam Klein <adamk@chromium.org>, Anne van Kesteren <annevk@annevk.nl>, Alex Russell <slightlyoff@chromium.org>, "public-script-coord@w3.org" <public-script-coord@w3.org>
On Thu, Mar 7, 2013 at 5:18 PM, Adam Barth <w3c@adambarth.com> wrote: > I don't think I fully understood your message because it was quite > long and contained many complex external references. What I've > understood you to say is that you've managed to work around the > limitations of the current string-based template design by building a > complex mechanism for automatically escaping untrusted data. As an example, in browsing the source code of the autoescaping code you referenced, I found the following line: var HTML_TAG_REGEX_ = /<(?:!|\/?[a-z])(?:[^>'"]|"[^"]*"|'[^']*')*>/gi; As famously written on Stack Overflow [1], "Regex is not a tool that can be used to correctly parse HTML." In any case, we shouldn't require folks to write a thousand lines of JavaScript to use ECMAScript templates to safely produce HTML. That's a clear signal that we should revisit the design of the template system. Adam [1] http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags > Rather than forcing authors to layer complex (and therefore > error-prone) systems on top of a string-based template system, we > should instead provide authors with an AST-based template system that > avoids these security pitfalls. > > Adam > > > On Thu, Mar 7, 2013 at 5:02 PM, Mike Samuel <mikesamuel@gmail.com> wrote: >> Adam, >> >> I wrote some of the string template proposal, rewrote the template >> system that Google+ used to take the burden of XSS safety off app >> developers' shoulders, and more generally work on programming-language >> & tool approaches to software security. >> >> On Thu, 7 Mar 2013 Adam Barth said >>> The general problem with template strings is that they're an XSS risk. >>> Essentially, we're encouraging authors to mix untrusted data into >>> strings that will later be parsed by the HTML parser. If the attacker >>> is clever in selecting these untrusted strings, he'll be able to cause >>> the remainder of the string to be parsed differently than the author >>> intends. >> >> Are you familiar with >> https://js-quasis-libraries-and-repl.googlecode.com/svn/trunk/index.html >> ? >> >> The "Safe HTML with bad inputs" example shows contextual auto-escaping >> using string templates. >> >>> var firstName = [...]; >>> var lastName = [...]; >>> header.innerHTML = `<h1>Welcome ${ firstName } ${ lastName }!</h1>`; >>> >>> If firstName and lastName are are user-controlled (i.e., untrusted), >>> the above is an XSS vulnerability. For example, the attacker can set >>> firstName to "<img onerror='alert(/pwned/)'>". >> >> I strongly agree that safety should be the default. >> >> I would very much like the default to be overridable to be a late >> binding producer of string like values that distinguishes trusted >> substrings so that they can be auto-escaped based on context as >> described at http://google-caja.googlecode.com/svn/changes/mikesamuel/string-interpolation-29-Jan-2008/trunk/src/js/com/google/caja/interp/index.html >> >> I think targeting popular libraries is the best way to get this, since >> one typicalls wants to be able to push a new version of security >> sensitive code more quickly than one pushes new language >> specifications. >> >> >>> We have lots of implementation experience with these sorts of >>> string-based template systems because they're widely used in languages >>> like PHP. Our broad experience is that they lead to buggy, XSS-prone >>> code. >>> >>> The general anti-pattern to avoid is the following: >>> >>> template + input -> string -> HTML parser -> DOM >>> >>> A more secure approach is to first parse the template into a DOM and >>> then add the untrusted input into the DOM as text nodes. In this >>> approach, the attacker's maliciously crafted firstName would simply >>> end up as a text node and would not execute as script. (You might or >>> might not like other aspects of E4H, but one of its virtues is that it >>> follows this more secure pattern.) >> >> The DOM approach suffers several drawback >> 1. It's resistant to XSS but not robust since it doesn't deal with >> embedded languages. It trivially fails when substitutions appear >> inside URI attributes, or text nodes inside a script or style >> attribute. >> 2. It's tied to a particular language. If we wouldn't introduce new >> syntax specifically for SQL prepared statements, we shouldn't do it >> for the HTML equivalent and instead come up with a single syntactic >> construct that allows safe composition in any language. >> 3. It fails the ubiquitous <header><body><footer> pattern as described >> at https://js-quasis-libraries-and-repl.googlecode.com/svn/trunk/safetemplate.html >> >> The DOM approach can be generalized to a parse-tree approach to solve >> embedded languages as done by Yesod ( >> http://yannesposito.com/Scratch/en/blog/Yesod-tutorial-for-newbies/#bulletproof >> ). >> >> Yesod and similar approaches don't provide a good migration target for >> existing ad-hoc composition methods and at the end of this email, I >> include a mini-progress report on my attempt to comprehensively >> address content-composition in a way that I believe is much easier to >> use than Yesod. I believe Yesod also requires significant >> per-content-language work in the type-system and in hand-written >> encoders, and would be impossible to port from Haskell to stringly >> typed code. >> >>> I understand that someone (either the author or the browser) could >>> write an HTML tag for template strings that implements the more secure >> >> Already done. See link above. >> >>> pattern, but most authors will simply use the default mode, which >>> follows the insecure pattern. As a result, this language feature will >>> lead to many XSS vulnerabilities and general sadness in the world. >> >> I disagree. Without this, people will continue to use >> >> header.innerHTML = "<h1>Welcome " + firstName + " " + lastName + "!</h1>"; >> >> leading to great sadness, or if templates are based on the HTML DOM, >> we will just have other injection attacks instead still leading to >> general sadness. >> >> XSS is a special case of code injection, so to avoid "general sadness" >> we need to generalize to we need a principled approach to code >> injection that >> 1. deals with embedded languages >> 2. deals with multiple host languages, not just HTML >> 3. involves language definers in safe composition without bloating >> language specifications >> 4. provides a path to provable safety from injection for those who >> want to spend the time constructing the proofs >> >> https://www.usenix.org/lets-parse-prevent-pwnage ourlines Úlfar and my >> attempt to provide such a solution. The basic idea is that we take a >> language grammar like : >> >> HTMLTextNode := ([^<&] | CharacterReference)+; >> CharacterReference := "<" | ">" | ... | "&#" ([0-9]+) ";" | ...; >> >> and add annotations that explain the relationship between substrings and data: >> >> HTMLTextNode := @String (@Char [^<&] | CharacterReference)+ >> CharacterReference := @Char{"<"} "<" | @Char{">"} ">" | ... | >> "&#" (@ScalarCharValue [0-9]+) ";"; >> >> From such annotated grammars, we can generate code for encoders, >> decoders, sanitizers, and template context functions in library >> languages. >> >> I've got the encoder generator stuff done, have implemented VMs for >> the decoders, sanitizers, and am finishing up the template context >> functions. >> >> I have some experience writing, maintaining and debugging such >> grammars and am confident that the basic approach is workable. >> >> Once I've done that, I hope to write code-generator backends for JS, >> Java, Rust, Python. >> >> Then, using a combination of syntactic plug-in points like JS string >> templates, and Python style % operator overloading, I hope to make >> syntactically sugary and safe composition ubiquitously available so >> that the app-developer community will have as easy an answer to >> code-injection analogous to the "just use prepared statements" that is >> widely dispensed for ad-hoc SQL query creation.
Received on Friday, 8 March 2013 01:38:08 UTC