RE: Starting a script block with <!--

Thanks, that clarifies things a lot.

Looking closely at the state engine, however, I can only see that </script>
will not end the script block provided that it has been preceded by both an 
unterminated <!-- and a opening <script> tag.

I don't quite see how this meets the requirements in http://wiki.whatwg.org/wiki/CDATA_Escapes


For example, 
  <script><!--
   document.write('<scr'+'ipt></script>');
  //--></script>

Looks to me like it will transition from Script Data Double Escape Start to Script Data Escaped as soon as the ' following <scr is encountered.

That will prevent entry into Script Data Double Escaped and cause the scanning to exit the script state once the first </script> tag is seen.

Am I misreading this? I'm looking at http://dev.w3.org/html5/spec/Overview.html.



Herman

-----Original Message-----
From: Philip Taylor [mailto:excors@gmail.com] 
Sent: Sunday, March 21, 2010 2:09 PM
To: Herman Venter
Cc: public-html-comments@w3.org
Subject: Re: Starting a script block with <!--

On Sat, Mar 20, 2010 at 9:17 PM, Herman Venter <hermanv@microsoft.com> wrote:
> Hi
>
> I’m working on a prototype HTML5 parser for research purposes and recently
> bumped into this little bit of markup:
>
> <SCRIPT type=text/javascript><!-- site js --></SCRIPT>
>
> Trawling around the Web it seems that the expectation is that the script
> engine will ignore a (first?) line starting with <!--.
>
> The EcmaScript standard does not provide for this (at least not the last
> time I’ve read through it), so I thought perhaps the issue will be addressed
> in the new HTML5 standard.

http://wiki.whatwg.org/wiki/Web_ECMAScript#HTML_comments lists some
details of this. I think the idea is it has to be handled by the
scripting language spec (not by HTML5), e.g. changing the <script
type> in Firefox can change it from interpreting the <!--...--> as
comments to interpreting them as literal XML comment syntax in E4X,
and the <!-- comment thing works in external .js files too, and so if
ECMAScript doesn't specify this then it's a bug in ECMAScript.

> However, looking at the latter, I find myself hopelessly confused about the
> meaning of the “Script data escape start state” that is entered when <! is
> encountered inside a script tag body.
>
> As best as I can make things out, the end result is that the HTML comment in
> the above example is just passed through to the script engine, which leaves
> the question of what the script engine should do with the non compliant
> syntax dangling in the air.

It should always pass the text through unchanged - the purpose of this
is to handle cases like

  <script><!--
  document.write("<script>alert(1)</script>");
  alert(2);
  // --></script>

where the inner </script> is 'escaped' by the <!-- so that it doesn't
close the outer script element. The parsing algorithm has become
extremely complicated in order to maximise compatibility with legacy
content while avoiding the reparsing behaviour that most current
implementations have. http://wiki.whatwg.org/wiki/CDATA_Escapes has
some of the earlier attempts at finding a solution.

> I’m also left bemused by the purpose of the “Script data escape start
> state”. Some non normative text the explains the rationale behind this
> state, amplified by some concrete examples, would be a great improvement to
> the standard.

This is definitely an area where greater clarity would be nice!

> Sincerely
>
> Herman Venter

-- 
Philip Taylor
excors@gmail.com

Received on Friday, 26 March 2010 02:48:25 UTC