Re: Restrictions for contents of script elements

On Wed, 14 Apr 2010, Simon Pieters wrote:
> On Tue, 13 Apr 2010 22:03:42 +0200, Ian Hickson <ian@hixie.ch> wrote:
> > On Tue, 13 Apr 2010, Julien Royer wrote:
> > > 
> > > I don't understand the restrictions defined for the content of script
> > > elements:
> > > http://dev.w3.org/html5/spec/semantics.html#restrictions-for-contents-of-script-elements
> > > 
> > > script being a raw-text element, it can't contain the "script-end"
> > > production
> > > (http://www.w3.org/TR/html5/syntax.html#cdata-rcdata-restrictions).
> > > 
> > > Why do we need such a complex ABNF for the content of script elements?
> > 
> > Unfortunately for historical reasons the parsing rules for <script> blocks
> > are really obscure and can lead to some really strange results. For
> > example:
> > 
> > "<script><script></script>" closes at the </script>
> > "<script><!--</script>" closes at the </script>
> > "<script><!--<script></script></script>" closes at the _second_ </script>
> > 
> > Since we're basically stuck living with these silly rules (they're needed
> > to parse legacy documents), we have the complex ABNF you refer to to
> > prevent authors from trying to write stuff that doesn't work right.
> 
> But the rule for raw text elements in #writing already bans the third example,
> because it contains "</script>".

We still need it non-conforming in XML and in the DOM.


> If we want to make "<script><!--<script></script>--></script>" conforming,
> then script shouldn't be a raw text element in #writing.

We don't want that to be conforming, do we?

There are several overlapping constraints here:

 - We don't want <script> in XML or the DOM to contain markup that can't 
   be safely serialiased as HTML and round-tripped using the HTML parser 
   (modulo "acceptable" losses such as whitespace near the </body>). 
   Similarly, we don't want to technically allow the serialisation of HTML 
   that doesn't match how HTML is parsed.

 - We don't want <script> in HTML to contain markup that might cause 
   problems in legacy UAs. (This can be relaxed later, once legacy UAs 
   are history and the HTML parsing rules are uniformly implemented.)

The second one means disallowing "</script>" for now, even if it would be 
parsed correctly.

The first one means disallowing "<!--<script>" (amongst other things).

It's the combination of these constraints that leads to the rules in the 
spec now.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Received on Thursday, 29 July 2010 00:19:08 UTC