- From: Henri Sivonen <hsivonen@iki.fi>
- Date: Thu, 13 Aug 2009 10:04:24 +0300
- To: Jonas Sicking <jonas@sicking.cc>
- Cc: HTMLWG WG <public-html@w3.org>
On Aug 12, 2009, at 21:47, Jonas Sicking wrote: > On Wed, Aug 12, 2009 at 5:43 AM, Henri Sivonen<hsivonen@iki.fi> wrote: >> On Aug 12, 2009, at 14:57, Henri Sivonen wrote: >> >>> Wiki page created: >>> http://wiki.whatwg.org/wiki/CDATA_Escapes >>> >>> Comments welcome. >> >> >> So this fails due to regexp literals. :-( > > I'm not sure I understand. Regexp literals can contain an unpaired ' or ", and it's really difficult to tell apart regexp literals and divisions without a full JS parser. (And a full JS parser wouldn't help with VBScript.) > Can you give an example of markup that > fails to parse, and a url where that markup is used? From Philip: http://www.zelluloid.de/person/index.php3?id=79678 has all this on one line, so the heuristic would mask the real end tag due to the unpaired single quote in the regexp: <script type="text/javascript">var szu=encodeURIComponent(location.href); var szt=encodeURIComponent(document.title).replace(/\'/g,'`'); var szjsh=(window.location.protocol == 'https:'?'https://ssl.seitzeichen.de/':'http://w3.seitzeichen.de/') ; document.write(unescape("%3Cscript src='" + szjsh + "w/86/3c/ widget_863ce3df0b6bac66bf9259e95ee3a1bf.js' type='text/javascript'%3E %3C/script%3E"));</script> The backslash in the regexp doesn't save the day, because it's permissible to do stuff like .replace(/\n([^"]+)/g,''); -- Henri Sivonen hsivonen@iki.fi http://hsivonen.iki.fi/
Received on Thursday, 13 August 2009 07:05:08 UTC