Re: [whatwg] *** GMX Spamverdacht *** Parsing of meta refresh needs tweaking

On Tue, 06 Jan 2015 08:35:54 +0100, Julian Reschke <julian.reschke@gmx.de>  
wrote:

> On 2014-12-11 09:09, Simon Pieters wrote:
>> The spec's parsing rules of meta refresh causes infinite reloading on
>> some pages. In particular, the spec requires the "url=" to be present,
>> but there are pages that omit it. IE9 also requires "url=" apparently.
>> Gecko/Blink/WebKit allow "url=" to be omitted.
>>
>> For example, there is http://www.only-for-winners.com/ which has
>>
>>     <meta http-equiv="refresh"
>> content="0;http://www.aldanitinetwork.com" />
>>
>> Clearly this is intended to redirect, not reload the current page after
>> 0 seconds.
>>
>>
>> SELECT page, COUNT(*) AS num
>> FROM [httparchive:runs.2014_08_15_requests_body]
>> WHERE page = url
>> AND mimeType CONTAINS "html"
>> AND REGEXP_MATCH(LOWER(body),
>> r"<meta\s+[^>]*http-equiv\s*=\s*[\"']?refresh")
>> AND REGEXP_MATCH(LOWER(body),
>> r"<meta\s+[^>]*content\s*=\s*[\"']?\s*\d+\s*;\s*[^\"'>]")
>> AND NOT REGEXP_MATCH(LOWER(body),
>> r"<meta\s+[^>]*content\s*=\s*[\"']?\s*\d+\s*;\s*url=")
>> GROUP BY page
>>
>> 23 rows.
>>
>> I also noticed that Gecko allows the number to be omitted. I only found
>> one page doing that and it was using <meta http-equiv="refresh"
>> content=";URL="> so it seems we can fail parsing for that case.
>>
>
> I hear (a) these pages have been broken in IE for a long time, and (b)  
> only 23 (?) pages in your DB are found.

Right.

> So why not just leave them broken?

It's a worse user experience and it's a shorter path to interop to change  
IE.

-- 
Simon Pieters
Opera Software

Received on Wednesday, 7 January 2015 07:51:58 UTC