- From: 신정식, 申政湜 <jungshik@google.com>
- Date: Thu, 21 Jul 2011 18:09:33 -0700
- To: Chris Weber <chris@lookout.net>
- Cc: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>, public-iri@w3.org
- Message-ID: <CADaTyXVo+qNdxPgaM8kbWoqi+GvV+jiiuT9pKC1xR7kfrTox_A@mail.gmail.com>
On Thu, Jul 21, 2011 at 3:44 PM, Chris Weber <chris@lookout.net> wrote: > On 7/21/2011 1:15 PM, Leif Halvard Silli wrote: > >> The page in question uses Windows-1252/ISO-8859-1. Question: Would it >> have made a difference if instead of using ISO-8859-1 based percent >> encoding, Martin had typed the letter 'ü' directly? >> > > Yes it would, see "test 2" at <http://lookout.net/test/iri/**mixenc.php<http://lookout.net/test/iri/mixenc.php>>. > Using the same browser builds Martin did, but a slightly different test > setup. > Your test and Martin's test are totally different. 0xFC embedded in ISO-8859-1 encoded page represents a character U+00FC [1] while a stand-alone "%FC" in a page of any encoding in the path part should NOT be interpreted as a character because it's not a valid UTF-8 sequence. Jungshik [1] 0xFC (raw byte) in ISO-8859-5 would be a character U+045C, but a stand-alone '%FC' in the path part should remain escaped no matter what the page encoding is. > Test 2 from my set maps to Martin's Test 1 in that the "Dürst" is a part of > the path component and encoded in iso-8859-1 - he percent-encoded %FC and I > used the raw byte 0xFC. The test case is represented below, where <0xHH> > represents a raw byte. > > http://www.example.com/D<0xFC>**rst/ > > The results of display are below. > > Opera (11.50, Win7): > http://www.example.com/Dürst/ > > Note here that the raw byte <0xFC> was visibly converted to Unicode <0xC3 > 0xBC> and displayed as iso-8859-1 (presumably) in the display. > > Firefox (5.0, Win7): > http://www.example.com/Dürst/ > > > IE (8.0.7601.17514, Win7): > http://www.example.com/Dürst/ > > > Chrome (12.0.742.122, Win7): > http://www.example.com/Dürst/ > > > Safari (5.0.4 (7533.20.27)): > http://www.example.com/Dürst/ > > In all of the above cases, the <0xFC> was transcoded to UTF-8 and > percent-encoded for the generated HTTP request. > > http://www.example.com/D%C3%**BCrst/ <http://www.example.com/D%C3%BCrst/> > > > > >> Because, if, in a ISO-8859-1 encoded page, hef="D%FCrst" does not work >> as well as href="Dürst", then I think HTML5 validators in fact should >> warn against use of percent encoding that isn't UTF-8 based. >> > > That would probably be ideal but would not provide for raw data that might > need to be passed in the IRI, especially the query component. > > > Best regards, > Chris > >
Received on Friday, 22 July 2011 01:10:06 UTC