- From: Aryeh Gregor <Simetrical+w3c@gmail.com>
- Date: Sun, 30 Jan 2011 15:43:24 -0500
- To: Michal Zalewski <lcamtuf@coredump.cx>, Adam Barth <w3c@adambarth.com>
- Cc: "public-web-security@w3.org" <public-web-security@w3.org>
On Sun, Jan 30, 2011 at 12:43 AM, Michal Zalewski <lcamtuf@coredump.cx> wrote: > 1) Their performance / memory usage impact will probably render them > largely impractical to put several dozen or hundred of them on a > single page - and this is how many bits of untrusted text you may have > on a page of a typical discussion forum or a mail client. Is this conceivably avoidable if you want to allow full HTML markup in the untrusted snippets? > 2) For simple text-only output, the need to apply a specific transform > to the payload (and do it well) is arguably comparable with the > difficulty of avoiding XSS in the same scenario. If it's text-only, simple HTML escaping is the way to do it. It's error-prone, but I don't see any clear advantage to other methods. On Sun, Jan 30, 2011 at 1:21 AM, Adam Barth <w3c@adambarth.com> wrote: > One way of doing that is using HTML entities. For example, you could > use a base64 entity: > > &%SGkgTWljaGFsIFphbGV3c2tpCg==; > > That would expand to pure text without any fear of injection. > Alternatively, you can imagine letting pages define their own HTML > entities in some namespace: > > Hi &$username; > > where somewhere else we associate $username with "Michal Zalewski". > Neither of those things requires any particular injustice to HTML > syntax. What advantage does that have over just HTML escaping? Would it be any less error-prone? It would be obvious from the markup what was escaped and what wasn't, but that seems to help authors only marginally more than attackers if at all, and doesn't seem worth the implementation effort or source-code obfuscation. On Sun, Jan 30, 2011 at 2:31 PM, Michal Zalewski <lcamtuf@coredump.cx> wrote: > I think the only realistic way we can eventually have this is to have > a method for delivering DOM tree directly to the browser, without the > need to parse it on every client (which, if you come think about it, > is a remarkable waste of CPU resources); this would give a lot more > freedom to simple web frameworks to tackle XSS. Sites could already do this by just constructing their output using DOM methods instead of string manipulation, then serializing it for transmission. Whether the *transmission* format is binary or text is orthogonal to any security concerns, AFAICT. In practice, people use string manipulation because it's much more convenient than DOM manipulation in common web programming languages. I've been told Facebook has extended PHP to have XML string literals, so you can do something like echo <span>Hello, $username!</span>; and it will parse <span>Hello, $username!</span> as an XML literal fragment, substituting $username according to normal PHP rules, but HTML-escaping it first. Tainting systems (e.g., <http://wiki.php.net/rfc/taint>) could serve a similar purpose. But this is all a tools problem -- standards can't do anything about how authors write pages. (Conceivably it would make sense for *performance* reasons to define a binary format for encoding DOMs, and use that as a transmission format. But that doesn't affect security AFAICT.)
Received on Sunday, 30 January 2011 20:44:18 UTC