Re: E4H and constructing DOMs from Jonas Sicking on 2013-03-08 (public-script-coord@w3.org from January to March 2013)

From: Jonas Sicking <jonas@sicking.cc>
Date: Thu, 7 Mar 2013 19:57:46 -0800
To: mikesamuel@gmail.com
Cc: "public-script-coord@w3.org" <public-script-coord@w3.org>
Message-ID: <CA+c2ei_pMWziiwdtXSTS2vJRNfqDbTxh-51EZxvkbfQErL-=nA@mail.gmail.com>

On Thu, Mar 7, 2013 at 5:55 PM, Mike Samuel <mikesamuel@gmail.com> wrote:
> 2013/3/7 Adam Barth <w3c@adambarth.com>:
>> On Thu, Mar 7, 2013 at 5:18 PM, Adam Barth <w3c@adambarth.com> wrote:
>>> I don't think I fully understood your message because it was quite
>>> long and contained many complex external references.  What I've
>>> understood you to say is that you've managed to work around the
>>> limitations of the current string-based template design by building a
>>> complex mechanism for automatically escaping untrusted data.
>>
>> As an example, in browsing the source code of the autoescaping code
>> you referenced, I found the following line:
>>
>> var HTML_TAG_REGEX_ = /<(?:!|\/?[a-z])(?:[^>'"]|"[^"]*"|'[^']*')*>/gi;
>>
>> As famously written on Stack Overflow [1], "Regex is not a tool that
>> can be used to correctly parse HTML."
>
> That doesn't apply since this is not parsing, it is lexing, and
> regular expressions can be used to lex HTML.

Actually, no you can't. For example the lexing of contents of <script>
elements is quite complex.

/ Jonas

Received on Friday, 8 March 2013 03:58:43 UTC