- From: Simon Pieters <simonp@opera.com>
- Date: Thu, 11 Dec 2014 09:09:11 +0100
- To: "whatwg@whatwg.org" <whatwg@whatwg.org>
The spec's parsing rules of meta refresh causes infinite reloading on some pages. In particular, the spec requires the "url=" to be present, but there are pages that omit it. IE9 also requires "url=" apparently. Gecko/Blink/WebKit allow "url=" to be omitted. For example, there is http://www.only-for-winners.com/ which has <meta http-equiv="refresh" content="0;http://www.aldanitinetwork.com" /> Clearly this is intended to redirect, not reload the current page after 0 seconds. SELECT page, COUNT(*) AS num FROM [httparchive:runs.2014_08_15_requests_body] WHERE page = url AND mimeType CONTAINS "html" AND REGEXP_MATCH(LOWER(body), r"<meta\s+[^>]*http-equiv\s*=\s*[\"']?refresh") AND REGEXP_MATCH(LOWER(body), r"<meta\s+[^>]*content\s*=\s*[\"']?\s*\d+\s*;\s*[^\"'>]") AND NOT REGEXP_MATCH(LOWER(body), r"<meta\s+[^>]*content\s*=\s*[\"']?\s*\d+\s*;\s*url=") GROUP BY page 23 rows. I also noticed that Gecko allows the number to be omitted. I only found one page doing that and it was using <meta http-equiv="refresh" content=";URL="> so it seems we can fail parsing for that case. -- Simon Pieters Opera Software
Received on Thursday, 11 December 2014 08:07:49 UTC