- From: Adam Barth <w3c@adambarth.com>
- Date: Wed, 25 Aug 2010 13:50:14 -0700
== Summary == HTML should support Base64-encoded entities to make it easier for authors to include untrusted content in their documents without risking XSS. For example, &%SFRNTDUncyA8Y2FudmFzPiBlbGVtZW50IGlzIGF3ZXNvbWUuCg==; would decode to "HTML5's <canvas> element is awesome." Notice that the < and > characters get emitted by the parser as character tokens. That means they can't be used by an attacker for XSS. These entities can be used safely both in intertag content as well as in attribute values. == Use Case == Authors often combine trusted and untrusted text into HTML documents. If done naively, an attacker can supply HTML markup, including script, in the untrusted script, resulting in a cross-site script attack. Authors want a way to include untrusted content safely in HTML documents without risking XSS. == Workarounds == Currently, authors must carefully escape all untrusted content to prevent an attacker from injecting HTML. Unfortunately, authors often apply the incorrect escaping or forget to escape entirely, resulting in security vulnerabilities. Escaping content in HTML is tricky because authors need to use different escaping rules for different contexts. For example, PHP's htmlspecialchars isn't sufficient in the following contexts: <img alt=<?php echo htmlspecialchars($name) ?> src="..."> <script> elmt.innerHTML = 'Hi there <?php echo htmlspecialchars($name) ?>.'; </script> Some framework convert untrusted content to a series of hex entities, but that greatly increases the length of the content. == Proposal == We should add a new kind of HTML entity that authors can use to include untrusted content. In particular, authors should be able to supply untrusted content in base64, which nicely avoids any scary characters. We can avoid clashes with existing or future entities by using a new character after the & escape character. In particular, we could use the % character: &%SFRNTDUncyA8Y2FudmFzPiBlbGVtZW50IGlzIGF3ZXNvbWUuCg==; Authors could then supply untrusted content as follows: <img alt=<?php echo htmlescape($name) ?> src="..."> where htmlescape is defined as follows: function htmlescape($text) { return "&%".base64_encode($text).";"; } Adam
Received on Wednesday, 25 August 2010 13:50:14 UTC