- From: Julian Reschke <julian.reschke@gmx.de>
- Date: Tue, 06 Jan 2015 08:35:54 +0100
- To: Simon Pieters <simonp@opera.com>, "whatwg@whatwg.org" <whatwg@whatwg.org>
On 2014-12-11 09:09, Simon Pieters wrote: > The spec's parsing rules of meta refresh causes infinite reloading on > some pages. In particular, the spec requires the "url=" to be present, > but there are pages that omit it. IE9 also requires "url=" apparently. > Gecko/Blink/WebKit allow "url=" to be omitted. > > For example, there is http://www.only-for-winners.com/ which has > > <meta http-equiv="refresh" > content="0;http://www.aldanitinetwork.com" /> > > Clearly this is intended to redirect, not reload the current page after > 0 seconds. > > > SELECT page, COUNT(*) AS num > FROM [httparchive:runs.2014_08_15_requests_body] > WHERE page = url > AND mimeType CONTAINS "html" > AND REGEXP_MATCH(LOWER(body), > r"<meta\s+[^>]*http-equiv\s*=\s*[\"']?refresh") > AND REGEXP_MATCH(LOWER(body), > r"<meta\s+[^>]*content\s*=\s*[\"']?\s*\d+\s*;\s*[^\"'>]") > AND NOT REGEXP_MATCH(LOWER(body), > r"<meta\s+[^>]*content\s*=\s*[\"']?\s*\d+\s*;\s*url=") > GROUP BY page > > 23 rows. > > I also noticed that Gecko allows the number to be omitted. I only found > one page doing that and it was using <meta http-equiv="refresh" > content=";URL="> so it seems we can fail parsing for that case. > I hear (a) these pages have been broken in IE for a long time, and (b) only 23 (?) pages in your DB are found. So why not just leave them broken? Best regards, Julian
Received on Tuesday, 6 January 2015 12:41:35 UTC