- From: Herman Venter <hermanv@microsoft.com>
- Date: Fri, 26 Mar 2010 02:47:49 +0000
- To: Philip Taylor <excors@gmail.com>
- CC: "public-html-comments@w3.org" <public-html-comments@w3.org>
Thanks, that clarifies things a lot. Looking closely at the state engine, however, I can only see that </script> will not end the script block provided that it has been preceded by both an unterminated <!-- and a opening <script> tag. I don't quite see how this meets the requirements in http://wiki.whatwg.org/wiki/CDATA_Escapes For example, <script><!-- document.write('<scr'+'ipt></script>'); //--></script> Looks to me like it will transition from Script Data Double Escape Start to Script Data Escaped as soon as the ' following <scr is encountered. That will prevent entry into Script Data Double Escaped and cause the scanning to exit the script state once the first </script> tag is seen. Am I misreading this? I'm looking at http://dev.w3.org/html5/spec/Overview.html. Herman -----Original Message----- From: Philip Taylor [mailto:excors@gmail.com] Sent: Sunday, March 21, 2010 2:09 PM To: Herman Venter Cc: public-html-comments@w3.org Subject: Re: Starting a script block with <!-- On Sat, Mar 20, 2010 at 9:17 PM, Herman Venter <hermanv@microsoft.com> wrote: > Hi > > I’m working on a prototype HTML5 parser for research purposes and recently > bumped into this little bit of markup: > > <SCRIPT type=text/javascript><!-- site js --></SCRIPT> > > Trawling around the Web it seems that the expectation is that the script > engine will ignore a (first?) line starting with <!--. > > The EcmaScript standard does not provide for this (at least not the last > time I’ve read through it), so I thought perhaps the issue will be addressed > in the new HTML5 standard. http://wiki.whatwg.org/wiki/Web_ECMAScript#HTML_comments lists some details of this. I think the idea is it has to be handled by the scripting language spec (not by HTML5), e.g. changing the <script type> in Firefox can change it from interpreting the <!--...--> as comments to interpreting them as literal XML comment syntax in E4X, and the <!-- comment thing works in external .js files too, and so if ECMAScript doesn't specify this then it's a bug in ECMAScript. > However, looking at the latter, I find myself hopelessly confused about the > meaning of the “Script data escape start state” that is entered when <! is > encountered inside a script tag body. > > As best as I can make things out, the end result is that the HTML comment in > the above example is just passed through to the script engine, which leaves > the question of what the script engine should do with the non compliant > syntax dangling in the air. It should always pass the text through unchanged - the purpose of this is to handle cases like <script><!-- document.write("<script>alert(1)</script>"); alert(2); // --></script> where the inner </script> is 'escaped' by the <!-- so that it doesn't close the outer script element. The parsing algorithm has become extremely complicated in order to maximise compatibility with legacy content while avoiding the reparsing behaviour that most current implementations have. http://wiki.whatwg.org/wiki/CDATA_Escapes has some of the earlier attempts at finding a solution. > I’m also left bemused by the purpose of the “Script data escape start > state”. Some non normative text the explains the rationale behind this > state, amplified by some concrete examples, would be a great improvement to > the standard. This is definitely an area where greater clarity would be nice! > Sincerely > > Herman Venter -- Philip Taylor excors@gmail.com
Received on Friday, 26 March 2010 02:48:25 UTC