- From: Chris Weber <chris@lookout.net>
- Date: Thu, 21 Jul 2011 17:02:13 -0700
- To: "PUBLIC-IRI@W3.ORG" <PUBLIC-IRI@w3.org>
I'm going on a tangent from Martin's intent in the previous email, but it seems in the same vein overall. I was including some mixed encoding tests - iso-8859-1 mixed with UTF-8 in a hyperlink on an transitional HTML page served with the "iso-8859-1" Content-Type. The results are similar to Martin's test in the way bytes representing UTF-8 will be treated as such (most often) even in an iso-8859-1 page encoding. From the test page at <http://lookout.net/test/iri/mixenc.php> Test 3 mixes the raw bytes which would represent U+FF21 FULLWIDTH LATIN CAPITAL LETTER A in UTF-8, along with iso-8859-1 raw bytes for the "ü" in "Dürst". The following hyperlink represents the test case where <0xNN> is a raw byte. http://www.example.com/D<0xFC>rst/?<0xEF 0xBC 0xA1> The results of the display are as follows. Opera (11.50, Win7): http://www.example.com/Dürst/?%EF%BC%A1 Firefox (5.0, Win7): http://www.example.com/Dürst/?A IE (8.0.7601.17514, Win7): http://www.example.com/Dürst/?A Chrome (12.0.742.122, Win7):St http://www.example.com/Dürst/?A Safari (5.0.4 (7533.20.27)): http://www.example.com/Dürst/?A With the exception of IE, all of the above generated the following HTTP request : GET /D%C3%BCrst/?%EF%BC%A1 IE of course does not escape the bytes in the query string. GET /D%C3%BCrst/?A I tried to capture some of these test results into a table form at: <https://spreadsheets0.google.com/spreadsheet/ccc?key=0AifoWoA0trUndEZSTlRRNnd5MzE3N3RYOVlIVFFMREE&hl=en_US#gid=5> A question for browser implementers - In some cases it's obvious (Opera and MSIE) and others not so much: Do you know if the status bar display is using the page encoding or has converted the URI to UTF-8 for display? Best regards, Chris
Received on Friday, 22 July 2011 00:02:43 UTC