Re: Issues arising from not reparsing

On Aug 12, 2009, at 21:47, Jonas Sicking wrote:

> On Wed, Aug 12, 2009 at 5:43 AM, Henri Sivonen<> wrote:
>> On Aug 12, 2009, at 14:57, Henri Sivonen wrote:
>>> Wiki page created:
>>> Comments welcome.
>> So this fails due to regexp literals. :-(
> I'm not sure I understand.

Regexp literals can contain an unpaired ' or ", and it's really  
difficult to tell apart regexp literals and divisions without a full  
JS parser. (And a full JS parser wouldn't help with VBScript.)

> Can you give an example of markup that
> fails to parse, and a url where that markup is used?

 From Philip:

has all this on one line, so the heuristic would mask the real end tag  
due to the unpaired single quote in the regexp:

<script type="text/javascript">var  
szu=encodeURIComponent(location.href); var  
szt=encodeURIComponent(document.title).replace(/\'/g,'`'); var  
szjsh=(window.location.protocol == 'https:'?'':'') 
; document.write(unescape("%3Cscript src='" + szjsh + "w/86/3c/ 
widget_863ce3df0b6bac66bf9259e95ee3a1bf.js' type='text/javascript'%3E 

The backslash in the regexp doesn't save the day, because it's  
permissible to do stuff like .replace(/\n([^"]+)/g,'');

Henri Sivonen

Received on Thursday, 13 August 2009 07:05:08 UTC