Re: Issues arising from not reparsing from Henri Sivonen on 2009-08-13 (public-html@w3.org from August 2009)

From: Henri Sivonen <hsivonen@iki.fi>
Date: Thu, 13 Aug 2009 10:04:24 +0300
To: Jonas Sicking <jonas@sicking.cc>
Cc: HTMLWG WG <public-html@w3.org>
Message-Id: <6FEEEE1F-818B-4609-BAC9-440B55A303F5@iki.fi>

On Aug 12, 2009, at 21:47, Jonas Sicking wrote:

> On Wed, Aug 12, 2009 at 5:43 AM, Henri Sivonen<hsivonen@iki.fi> wrote:
>> On Aug 12, 2009, at 14:57, Henri Sivonen wrote:
>>
>>> Wiki page created:
>>> http://wiki.whatwg.org/wiki/CDATA_Escapes
>>>
>>> Comments welcome.
>>
>>
>> So this fails due to regexp literals. :-(
>
> I'm not sure I understand.

Regexp literals can contain an unpaired ' or ", and it's really  
difficult to tell apart regexp literals and divisions without a full  
JS parser. (And a full JS parser wouldn't help with VBScript.)

> Can you give an example of markup that
> fails to parse, and a url where that markup is used?


 From Philip:

http://www.zelluloid.de/person/index.php3?id=79678

has all this on one line, so the heuristic would mask the real end tag  
due to the unpaired single quote in the regexp:

<script type="text/javascript">var  
szu=encodeURIComponent(location.href); var  
szt=encodeURIComponent(document.title).replace(/\'/g,'`'); var  
szjsh=(window.location.protocol == 'https:'?'https://ssl.seitzeichen.de/':'http://w3.seitzeichen.de/') 
; document.write(unescape("%3Cscript src='" + szjsh + "w/86/3c/ 
widget_863ce3df0b6bac66bf9259e95ee3a1bf.js' type='text/javascript'%3E 
%3C/script%3E"));</script>

The backslash in the regexp doesn't save the day, because it's  
permissible to do stuff like .replace(/\n([^"]+)/g,'');

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/

Received on Thursday, 13 August 2009 07:05:08 UTC