prohibited code points and error handling in Chrome and MSIE from Chris Weber on 2011-07-04 (public-iri@w3.org from July 2011)

From: Chris Weber <chris@lookout.net>
Date: Sun, 03 Jul 2011 17:47:38 -0700
To: "PUBLIC-IRI@W3.ORG" <PUBLIC-IRI@w3.org>
Message-ID: <4E110DAA.30203@lookout.net>

I'm curious about a test case that caught my attention:

(<a href='http://example.com/&#xfdd0;foo' id='302'>302</a><img 
src='http://example.com/&#xfdd0;foo' />)

For Chrome - do you know if this result is the way an IRI parsing should 
get represented in the DOM? This seems to be the same result in other 
test cases such as <http://&#xD87E;&#xDC68;.com> as well. But it also 
happens with URI cases as well <http://[::eeee:192.168.0.1]/>

For IE - is the transformation of U+FDD0 to a "?" an expected handling 
of prohibited characters or another fallback path? Transformations like 
these seem dangerous for security reasons, e.g. bypassing filters.

U+FDD0 is prohibited under IDNA2003's nameprep step, and disallowed by 
IDNA2008. The results below are from the DOM parsing.

Scheme Hostname Path Query Browser
: Chrome/12.0
http: example.com /%EF%B7%90foo Opera/9.80
http: example.com ?zyx MSIE 7.0
http: example.com ?zyx MSIE 8.0
http: example.com /%EF%B7%90foo Firefox/4.0.1
http: example.com /﷐foo Safari/5.0.5


The raw HTTP request results for the <img> are as follows. The only 
exception was that Chrome did not make the request for the <img>.

Path Browser
/%EF%B7%90foo Opera/9.80
/?foo MSIE 7.0
/?foo MSIE 8.0
/%EF%B7%90foo Firefox/4.0.1
/%EF%B7%90foo Safari/5.0.5

Although Chrome did not make a request for the <img>, the <a> link is 
still clickable and resolves to the percent-encoded Unicode replacement 
character U+FFFD in the path "/%EF%BF%BDfoo".


Best regards,
Chris

Received on Monday, 4 July 2011 00:48:15 UTC