- From: <bugzilla@jessica.w3.org>
- Date: Mon, 21 Oct 2013 18:47:33 +0000
- To: public-html-bugzilla@w3.org
https://www.w3.org/Bugs/Public/show_bug.cgi?id=23587 Bug ID: 23587 Summary: Provide rationale for content restrictions for script tag Product: HTML WG Version: unspecified Hardware: PC OS: Windows NT Status: NEW Severity: normal Priority: P3 Component: HTML5 spec Assignee: dave.null@w3.org Reporter: qbolec@gmail.com QA Contact: public-html-bugzilla@w3.org CC: mike@w3.org, public-html-admin@w3.org, public-html-wg-issue-tracking@w3.org Consider following HTML: <!doctype html> <html> <head> <script type="text/javascript"> var user={name:"Jakub <!-- <script>"}</script> <!-- innocent comment --> <script type="text/javascript"> console.log("Hello" + user.name); </script> </head> </html> To put it into perspective imagine, that the server which generated it was executing something "innocent" like this: <?php echo 'var user=' . json_encode($user) ?> The problem is, that given the current automaton description found at http://www.w3.org/TR/html5/syntax.html#script-data-state it leads to matching the "<!--" found in the username with "-->" located after the "innocent comment". Moreover the script body surprisingly extends over to the "next" script. This can be verified in current version of Chrome for example, using the Chrome's console: $$('script')[0].innerHTML " var user={name:"Jakub <!-- <script>"}</script> <!-- innocent comment --> <script type="text/javascript"> console.log("Hello" + user.name); " Observe that there is no warning for the developer at the moment of parsing HTML. However when the JS parser kicks in it gives a (rather) surprising error: Uncaught SyntaxError: Invalid regular expression: missing / The reason for that is that the line var user={name:"Jakub <!-- <script>"}</script> gets parsed as var user=X<Y where Y is "/script>", which resembles a regular expression. Now, what I want to complain about is that the story can end in various different ways depending on such "details" as: 1. do I put the </script> in the same line or not 2. do I put semicolon after definition of user variable 3. do I have an <!-- innocent comment --> after the tag or not 4. do I have a second <script> tag after the innocent comment or not Clearly, this does not help to reach goals which are mentioned in Section "1.10.3 Restrictions on content models and on attribute values", as I wasted 8 hours today debugging this issue in a real life scenario. The reason it was so hard to debug, was that it required aligment of so many planets to reproduce (the username had to contain both <!-- and <script> but not </script>, we needed an html comment afterwards, and another script tag, all of which were independent conditions which happened or not depending on things like adserver targeting etc.). It would help me a lot if the section "4.3.1.2 Restrictions for contents of script elements" at least provided reasons behind this strange set of rules -- I would really like to understand why the "double escape" mode triggered by "<!-- <script>" combo is needed. It would helped even more if some practices were suggested, which could help avoided such problems (for example: "Authors should always escape "<" character as "\x3C" in their strings" or something). -- You are receiving this mail because: You are the QA Contact for the bug.
Received on Monday, 21 October 2013 18:47:36 UTC