- From: Henri Sivonen <hsivonen@iki.fi>
- Date: Mon, 10 Aug 2009 14:14:48 +0300
- To: HTMLWG WG <public-html@w3.org>
Firefox nightlies have had an HTML5 parser implementation behind a pref for a month now. The Web compat issues that have been uncovered have been surprisingly few, which is great. However, there are three Web compat issues that don't have trivial fixes. They all are related to the HTML5 parsing algorithm not recovering from errors by rewinding the stream and reparsing with different rules. As such, if these are treated as bugs, they are spec bugs. 1) When the string "<!--" occurs inside a string literal in JavaScript, it starts and escape that hides </script> and the rest of the page is eaten into the script. https://bugzilla.mozilla.org/show_bug.cgi?id=503632 2) When a script starts with <script><!-- but doesn't end with --></ script> (ends with only </script>), the rest of the page is eaten into the script. https://bugzilla.mozilla.org/show_bug.cgi?id=504941 3) When there's no </title> end tag, the page gets eaten into the title. https://bugzilla.mozilla.org/show_bug.cgi?id=508075 see also https://bugs.webkit.org/show_bug.cgi?id=3905 https://bugzilla.mozilla.org/show_bug.cgi?id=42945 Personally, I'd like to avoid reparsing if at all possible, because it's a security risk and because it complicates the parser. In case #1, I think the right fix is to introduce more statefulness into the escapes so that <!-- and --> that occur inside string literals are heuristically ignored. (Anyone care to suggest a heuristic that doesn't involve rolling a JS parser into the HTML parser?) For case #2, I can't think of a fix that doesn't involve reparsing. Personally, I'd just leave it as WONTFIX and position the change from previous browser behavior as a security improvement. (To my great surprise, there haven't been reports of this issue with actual comments--only with escapes inside inline scripts.) For case #3, I'd personally like to treat it as WONTFIX, because IE6 and IE8 both seem to do less recovery here than Gecko and WebKit. Therefore, pages that lack </title> are probably already broken in IE, so it's unlikely that such pages are common enough to be a big deal on the Web scale. (IE seems to recover sometimes but only rarely. I can't figure out what the recovery rule is.) Any thoughts on what the right way to deal with these is? -- Henri Sivonen hsivonen@iki.fi http://hsivonen.iki.fi/
Received on Monday, 10 August 2009 11:15:32 UTC