Re: E4H and constructing DOMs

2013/3/7 Maciej Stachowiak <>:
> I strongly suspect there are more bugs than the one I found, as the regexp
> looks way too simple to capture the full behavior of the relevant HTML
> tokenizer states. Regrettably I do not have the time or expertise to hunt
> for more.

Here's the context

It's used in a function that strips out tags from a string before '<'
and '>' are escaped with '&lt;' and '&gt;'.

This is so that accidental inclusion of a string of "known-safe HTML"
(not an untrusted input) in the value of an HTML attribute doesn't
cause tags to appear in, e.g. title hover text.  This is not part of
the TCB.

I suspect there are other bugs too as there always are in software and
as there will be in any AST solution as well.

Received on Friday, 8 March 2013 05:08:21 UTC