- From: Ian Hickson <ian@hixie.ch>
- Date: Thu, 29 Jul 2010 00:18:40 +0000 (UTC)
- To: Simon Pieters <simonp@opera.com>
- cc: Julien Royer <eldebaran@gmail.com>, public-html-comments@w3.org
On Wed, 14 Apr 2010, Simon Pieters wrote: > On Tue, 13 Apr 2010 22:03:42 +0200, Ian Hickson <ian@hixie.ch> wrote: > > On Tue, 13 Apr 2010, Julien Royer wrote: > > > > > > I don't understand the restrictions defined for the content of script > > > elements: > > > http://dev.w3.org/html5/spec/semantics.html#restrictions-for-contents-of-script-elements > > > > > > script being a raw-text element, it can't contain the "script-end" > > > production > > > (http://www.w3.org/TR/html5/syntax.html#cdata-rcdata-restrictions). > > > > > > Why do we need such a complex ABNF for the content of script elements? > > > > Unfortunately for historical reasons the parsing rules for <script> blocks > > are really obscure and can lead to some really strange results. For > > example: > > > > "<script><script></script>" closes at the </script> > > "<script><!--</script>" closes at the </script> > > "<script><!--<script></script></script>" closes at the _second_ </script> > > > > Since we're basically stuck living with these silly rules (they're needed > > to parse legacy documents), we have the complex ABNF you refer to to > > prevent authors from trying to write stuff that doesn't work right. > > But the rule for raw text elements in #writing already bans the third example, > because it contains "</script>". We still need it non-conforming in XML and in the DOM. > If we want to make "<script><!--<script></script>--></script>" conforming, > then script shouldn't be a raw text element in #writing. We don't want that to be conforming, do we? There are several overlapping constraints here: - We don't want <script> in XML or the DOM to contain markup that can't be safely serialiased as HTML and round-tripped using the HTML parser (modulo "acceptable" losses such as whitespace near the </body>). Similarly, we don't want to technically allow the serialisation of HTML that doesn't match how HTML is parsed. - We don't want <script> in HTML to contain markup that might cause problems in legacy UAs. (This can be relaxed later, once legacy UAs are history and the HTML parsing rules are uniformly implemented.) The second one means disallowing "</script>" for now, even if it would be parsed correctly. The first one means disallowing "<!--<script>" (amongst other things). It's the combination of these constraints that leads to the rules in the spec now. -- Ian Hickson U+1047E )\._.,--....,'``. fL http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Received on Thursday, 29 July 2010 00:19:08 UTC