- From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
- Date: Fri, 22 Jul 2011 13:42:59 +0200
- To: Jungshik SHIN (신정식) <jshin1987+w3@gmail.com>
- Cc: Chris Weber <chris@lookout.net>, "Phillips, Addison" <addison@lab126.com>, "public-iri@w3.org" <public-iri@w3.org>
Jungshik SHIN (신정식), Fri, 22 Jul 2011 00:46:18 -0700: >>>> The point of this was to test the display as Martin had, but using >>>> unescaped >>>> bytes. From the results of Test 3 it looks like Firefox, Chrome, >>>> and Safari all >>>> check for "UTF8ness" in the query component when displaying the >>>> IRI in spite of >>>> the page encoding, hence you can visually see the U+FF21 FULLWIDTH LATIN >>>> CAPITAL LETTER A. >>> >>> Which I consider to be a serious bug in handling an IRI. In theory, >>> the characters in the HTML document are converted to a sequence of >>> Unicode characters when the page is parsed. I should have three >>> Unicode code points in the query portion of the IRI in the above >>> href. They happen to be encoded using three ISO-8859-1 bytes. But >>> they just as well could be encoded asA >> >> Martin's test showed that even the path component containing >> "%C3%BC" would be percent-decoded and displayed as UTF-8... even >> when the page encoding was declared as iso-8859-1. With Opera and IE8 as exceptions, when it comes to display. However, even Opera and IE8 exectues it as UTF-8. > That's the correct and expected behavior. The path part is always > assumed to be in UTF-8 regardless of the referrer page encoding. The > query part is a different story. I believed that the issue for debate was how %FC should be displayed and handled. However, while it would give the best user experience to *display* and *handle* the %FC in Martin's test as %C3%BC, it might also be considered a feature that href="D%FCrst" in Martin's test does not work. E.g. if Martin's page was converted from legacy encoding to UTF-8, then href="D%FCrst" would stop working even if it *had* worked in the legacy encoded page. Chris, may be you could show an example of when it would be a problem if validators would warn against using not-UTF8-based percent encodings? -- Leif Halvard Silli
Received on Friday, 22 July 2011 11:43:45 UTC