Date: Wed, 26 Feb 1997 15:04:48 +0100 (MET) From: "Martin J. Duerst" <email@example.com> To: Jonathan Rosenne <Jonathan_Rosenne@CompuServe.com> Cc: URI List <firstname.lastname@example.org> Subject: Re: URL internationalization! In-Reply-To: <199702251306_MC2-11B1-87E5@compuserve.com> Message-Id: <Pine.SUN.3.95q.970226145545.245G-100000@enoshima> On Tue, 25 Feb 1997, Jonathan Rosenne wrote: > > As an example, > > let's take a resource name with a G with breve (U+011E). Let's > > assume that on the server, resource names are encoded in iso-8859-3. > > Then the G with breve contains appears as %AB in a well-formed > > URL. Now suppose somebody put that URL into an HTML document > > that is encoded in iso-8859-3, in 8-bit form (i.e. the URL contains > > the octet 0xAB for the G with breve character), and that that > > document is correctly tagged as iso-8859-3. > > > > Now assume a browser sends a request with > > Accept-Charset: iso-8859-5 > > The server (or a proxy) translates the whole document from > > iso-8859-3 to iso-8859-5 to honor the request of the browser. > > The G with breve gets changed to 0xD0. The client receives > > the 0xD0. If it "behaves the same as if it had received the > > corresponding %XX", i.e. %D0, the URL will not work at all. > > I don't understand. What if the user uses 8859-8, which has no G-breve? I > mean, what if it says Accept-Charset: iso-8859-8? Then this depends on the sophistication of the transcoding server/proxy. For (i18n) HTML, the obvious solution is to replace the G-breve with Ğ, the decimal value of U+011E. For formats other than HTML, we might be out of luck. The server/ proxy may convert it to a sequence %HH%HH corresponding to G-breve in UTF-8 if it is sure that the G-breve is in an URL. But it is much more difficult to decide what could be an URL in an arbitrary format than to replace all unrepresentable characters by numeric character references in HTML (which can be done irrespective of whether it is an URL or something else. This is an additional reason for why we should be careful with the introduction of natively encoded URLs, and why I am abstaining for the moment to fully propose it. Regards, Martin.