- From: William A. Rowe, Jr. <wrowe@rowe-clan.net>
- Date: Mon, 08 Aug 2005 05:15:40 -0500
- To: Julian Reschke <julian.reschke@gmx.de>
- CC: Robert Collins <robertc@robertcollins.net>, HTTP Working Group <ietf-http-wg@w3.org>
Julian Reschke wrote: > > Robert Collins wrote: > >> On Thu, 2005-08-04 at 12:19 +0200, Julian Reschke wrote: >> >>> Robert Collins wrote: >>> That being said; the *best* way (if you control the server) to embed >>> non ASCII data into a URI usually is to UTF-8 encode, then >>> percent-escape. >> >> What I'm asking for is current RFC or std document that specifies this. >> The reference for Location is because Location is the header I currently >> need to ensure correct encoding of, and its definition is 'absoluteURI' >> in rfc2616, which is incorporated from ... and thus we trace through to > > The rules for the URI in the Location header are exactly the same as for > the HTTP request itself, so I'm still not sure where there would be a > difference. > >> std66 which passes the buck for the canonical pre-percent-escape >> encoding back to the standard defining the URI scheme, which is rfc2616. >> Full circle and no stance taken. > > There's no single encoding that will work for any server. There may be > RFCs that recommend UTF-8 (possibly RFC3987), but these do not > normatively effect RFC2616-compliant servers. First, rfc2616 predates std66, which doesn't override the conclusions of that rfc; section 3.2.3 spells it out, to a webserver the URI is an opaque series of octets with a few specific exceptions. The webserver has no opinion. That said; for example WinNT's filesystem is truly unicode, which Apache 2.0, for example, treats as a utf-8 filesystem for resource names. The typical *nix system today may in fact use utf-8 file names, but does not enforce them (they remain opaque octets to the posix layer). It's entirely up to the implementor what to serve based on a URI. Convention is developing around utf-8 URI names, because no header in the http spec further defines how to interpret the URI (the request or response body, yes; the URI, no.) But again, in HTTP/1.1 do as you like as long as you escape the non-ascii and reserved codes. Std66 didn't add an explicit charset for us, and it's unlikely until yet another rfc for HTTP. Bill
Received on Monday, 8 August 2005 10:16:48 UTC