- From: Henri Sivonen <hsivonen@iki.fi>
- Date: Thu, 13 Aug 2009 10:04:24 +0300
- To: Jonas Sicking <jonas@sicking.cc>
- Cc: HTMLWG WG <public-html@w3.org>
On Aug 12, 2009, at 21:47, Jonas Sicking wrote:
> On Wed, Aug 12, 2009 at 5:43 AM, Henri Sivonen<hsivonen@iki.fi> wrote:
>> On Aug 12, 2009, at 14:57, Henri Sivonen wrote:
>>
>>> Wiki page created:
>>> http://wiki.whatwg.org/wiki/CDATA_Escapes
>>>
>>> Comments welcome.
>>
>>
>> So this fails due to regexp literals. :-(
>
> I'm not sure I understand.
Regexp literals can contain an unpaired ' or ", and it's really
difficult to tell apart regexp literals and divisions without a full
JS parser. (And a full JS parser wouldn't help with VBScript.)
> Can you give an example of markup that
> fails to parse, and a url where that markup is used?
From Philip:
http://www.zelluloid.de/person/index.php3?id=79678
has all this on one line, so the heuristic would mask the real end tag
due to the unpaired single quote in the regexp:
<script type="text/javascript">var
szu=encodeURIComponent(location.href); var
szt=encodeURIComponent(document.title).replace(/\'/g,'`'); var
szjsh=(window.location.protocol == 'https:'?'https://ssl.seitzeichen.de/':'http://w3.seitzeichen.de/')
; document.write(unescape("%3Cscript src='" + szjsh + "w/86/3c/
widget_863ce3df0b6bac66bf9259e95ee3a1bf.js' type='text/javascript'%3E
%3C/script%3E"));</script>
The backslash in the regexp doesn't save the day, because it's
permissible to do stuff like .replace(/\n([^"]+)/g,'');
--
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/
Received on Thursday, 13 August 2009 07:05:08 UTC