- From: Mark S. Miller <erights@google.com>
- Date: Thu, 7 Mar 2013 20:36:18 -0800
- To: Maciej Stachowiak <mjs@apple.com>
- Cc: Jonas Sicking <jonas@sicking.cc>, Mike Samuel <mikesamuel@gmail.com>, "public-script-coord@w3.org" <public-script-coord@w3.org>
- Message-ID: <CABHxS9hvjPB_DTGCyq+0w3Euh26UEGRy1adnf3ZYOsjDNsDj2A@mail.gmail.com>
Hi Maciej, Please report it at https://code.google.com/p/google-caja/issues/list and select the Template: Private Issue. Thanks! On Thu, Mar 7, 2013 at 8:22 PM, Maciej Stachowiak <mjs@apple.com> wrote: > > On Mar 7, 2013, at 7:57 PM, Jonas Sicking <jonas@sicking.cc> wrote: > > On Thu, Mar 7, 2013 at 5:55 PM, Mike Samuel <mikesamuel@gmail.com> wrote: > > 2013/3/7 Adam Barth <w3c@adambarth.com>: > > On Thu, Mar 7, 2013 at 5:18 PM, Adam Barth <w3c@adambarth.com> wrote: > > I don't think I fully understood your message because it was quite > long and contained many complex external references. What I've > understood you to say is that you've managed to work around the > limitations of the current string-based template design by building a > complex mechanism for automatically escaping untrusted data. > > > As an example, in browsing the source code of the autoescaping code > you referenced, I found the following line: > > var HTML_TAG_REGEX_ = /<(?:!|\/?[a-z])(?:[^>'"]|"[^"]*"|'[^']*')*>/gi; > > As famously written on Stack Overflow [1], "Regex is not a tool that > can be used to correctly parse HTML." > > > That doesn't apply since this is not parsing, it is lexing, and > regular expressions can be used to lex HTML. > > > Actually, no you can't. For example the lexing of contents of <script> > elements is quite complex. > > > For further reference, tokenizing HTML looks like this: < > http://www.whatwg.org/specs/web-apps/current-work/multipage/tokenization.html#tokenization > >. > > It superficially looks like an FSM, so it seems tempting to process it > with a regexp, but interaction with tree construction makes it non-regular. > > Even if you ignore the non-regular bits, translating it to a regexp is > hard. For example, with a few minutes study I found a string that the HTML > spec and all browsers treat as an HTML open tag which is not matched by the > regexp that Adam quoted. I assume this is likely a security flaw in the > library it comes from. I am not sure if it's ok to post bug reports here or > if there is some private channel to disclose the security bug; I'll gladly > report it if someone tells me how. > > Regards, > Maciej > > > > -- Cheers, --MarkM
Received on Friday, 8 March 2013 04:36:46 UTC