Re: E4H and constructing DOMs from Mike Samuel on 2013-03-08 (public-script-coord@w3.org from January to March 2013)

From: Mike Samuel <mikesamuel@gmail.com>
Date: Fri, 8 Mar 2013 00:07:54 -0500
To: Maciej Stachowiak <mjs@apple.com>
Cc: "Mark S. Miller" <erights@google.com>, Jonas Sicking <jonas@sicking.cc>, "public-script-coord@w3.org" <public-script-coord@w3.org>
Message-ID: <CACod6GttU9jFLwV4NJn=iFRNYK7aqo0ZgegD1MTCmP5MdO4Kwg@mail.gmail.com>

2013/3/7 Maciej Stachowiak <mjs@apple.com>:
>
> https://code.google.com/p/google-caja/issues/detail?id=1670
>
> I strongly suspect there are more bugs than the one I found, as the regexp
> looks way too simple to capture the full behavior of the relevant HTML
> tokenizer states. Regrettably I do not have the time or expertise to hunt
> for more.

Here's the context

https://code.google.com/p/js-quasis-libraries-and-repl/source/browse/trunk/js/escapers.js#167

It's used in a function that strips out tags from a string before '<'
and '>' are escaped with '&lt;' and '&gt;'.

This is so that accidental inclusion of a string of "known-safe HTML"
(not an untrusted input) in the value of an HTML attribute doesn't
cause tags to appear in, e.g. title hover text.  This is not part of
the TCB.

I suspect there are other bugs too as there always are in software and
as there will be in any AST solution as well.

Received on Friday, 8 March 2013 05:08:21 UTC