Re: CSP XML Data with tokens from Aryeh Gregor on 2011-01-30 (public-web-security@w3.org from January 2011)

From: Aryeh Gregor <Simetrical+w3c@gmail.com>
Date: Sun, 30 Jan 2011 15:43:24 -0500
To: Michal Zalewski <lcamtuf@coredump.cx>, Adam Barth <w3c@adambarth.com>
Cc: "public-web-security@w3.org" <public-web-security@w3.org>
Message-ID: <AANLkTinxK1R0xk-L2afcfRf9+F_dfjk_=54YSPbG49aa@mail.gmail.com>

On Sun, Jan 30, 2011 at 12:43 AM, Michal Zalewski <lcamtuf@coredump.cx> wrote:
> 1) Their performance / memory usage impact will probably render them
> largely impractical to put several dozen or hundred of them on a
> single page - and this is how many bits of untrusted text you may have
> on a page of a typical discussion forum or a mail client.

Is this conceivably avoidable if you want to allow full HTML markup in
the untrusted snippets?

> 2) For simple text-only output, the need to apply a specific transform
> to the payload (and do it well) is arguably comparable with the
> difficulty of avoiding XSS in the same scenario.

If it's text-only, simple HTML escaping is the way to do it.  It's
error-prone, but I don't see any clear advantage to other methods.

On Sun, Jan 30, 2011 at 1:21 AM, Adam Barth <w3c@adambarth.com> wrote:
> One way of doing that is using HTML entities.  For example, you could
> use a base64 entity:
>
> &%SGkgTWljaGFsIFphbGV3c2tpCg==;
>
> That would expand to pure text without any fear of injection.
> Alternatively, you can imagine letting pages define their own HTML
> entities in some namespace:
>
> Hi &$username;
>
> where somewhere else we associate $username with "Michal Zalewski".
> Neither of those things requires any particular injustice to HTML
> syntax.

What advantage does that have over just HTML escaping?  Would it be
any less error-prone?  It would be obvious from the markup what was
escaped and what wasn't, but that seems to help authors only
marginally more than attackers if at all, and doesn't seem worth the
implementation effort or source-code obfuscation.

On Sun, Jan 30, 2011 at 2:31 PM, Michal Zalewski <lcamtuf@coredump.cx> wrote:
> I think the only realistic way we can eventually have this is to have
> a method for delivering DOM tree directly to the browser, without the
> need to parse it on every client (which, if you come think about it,
> is a remarkable waste of CPU resources);  this would give a lot more
> freedom to simple web frameworks to tackle XSS.

Sites could already do this by just constructing their output using
DOM methods instead of string manipulation, then serializing it for
transmission.  Whether the *transmission* format is binary or text is
orthogonal to any security concerns, AFAICT.  In practice, people use
string manipulation because it's much more convenient than DOM
manipulation in common web programming languages.  I've been told
Facebook has extended PHP to have XML string literals, so you can do
something like

echo <span>Hello, $username!</span>;

and it will parse <span>Hello, $username!</span> as an XML literal
fragment, substituting $username according to normal PHP rules, but
HTML-escaping it first.  Tainting systems (e.g.,
<http://wiki.php.net/rfc/taint>) could serve a similar purpose.  But
this is all a tools problem -- standards can't do anything about how
authors write pages.

(Conceivably it would make sense for *performance* reasons to define a
binary format for encoding DOMs, and use that as a transmission
format.  But that doesn't affect security AFAICT.)

Received on Sunday, 30 January 2011 20:44:18 UTC