W3C home > Mailing lists > Public > public-html-comments@w3.org > March 2010

Re: Starting a script block with <!--

From: Philip Taylor <excors@gmail.com>
Date: Sun, 21 Mar 2010 21:09:10 +0000
Message-ID: <ea09c0d11003211409s43105fc4g42856c4087c29c9e@mail.gmail.com>
To: Herman Venter <hermanv@microsoft.com>
Cc: "public-html-comments@w3.org" <public-html-comments@w3.org>
On Sat, Mar 20, 2010 at 9:17 PM, Herman Venter <hermanv@microsoft.com> wrote:
> Hi
>
> I’m working on a prototype HTML5 parser for research purposes and recently
> bumped into this little bit of markup:
>
> <SCRIPT type=text/javascript><!-- site js --></SCRIPT>
>
> Trawling around the Web it seems that the expectation is that the script
> engine will ignore a (first?) line starting with <!--.
>
> The EcmaScript standard does not provide for this (at least not the last
> time I’ve read through it), so I thought perhaps the issue will be addressed
> in the new HTML5 standard.

http://wiki.whatwg.org/wiki/Web_ECMAScript#HTML_comments lists some
details of this. I think the idea is it has to be handled by the
scripting language spec (not by HTML5), e.g. changing the <script
type> in Firefox can change it from interpreting the <!--...--> as
comments to interpreting them as literal XML comment syntax in E4X,
and the <!-- comment thing works in external .js files too, and so if
ECMAScript doesn't specify this then it's a bug in ECMAScript.

> However, looking at the latter, I find myself hopelessly confused about the
> meaning of the “Script data escape start state” that is entered when <! is
> encountered inside a script tag body.
>
> As best as I can make things out, the end result is that the HTML comment in
> the above example is just passed through to the script engine, which leaves
> the question of what the script engine should do with the non compliant
> syntax dangling in the air.

It should always pass the text through unchanged - the purpose of this
is to handle cases like

  <script><!--
  document.write("<script>alert(1)</script>");
  alert(2);
  // --></script>

where the inner </script> is 'escaped' by the <!-- so that it doesn't
close the outer script element. The parsing algorithm has become
extremely complicated in order to maximise compatibility with legacy
content while avoiding the reparsing behaviour that most current
implementations have. http://wiki.whatwg.org/wiki/CDATA_Escapes has
some of the earlier attempts at finding a solution.

> I’m also left bemused by the purpose of the “Script data escape start
> state”. Some non normative text the explains the rationale behind this
> state, amplified by some concrete examples, would be a great improvement to
> the standard.

This is definitely an area where greater clarity would be nice!

> Sincerely
>
> Herman Venter

-- 
Philip Taylor
excors@gmail.com
Received on Sunday, 21 March 2010 21:11:58 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 1 June 2011 00:14:01 GMT