Re: location uri, ucs and the http scheme definition.

Julian Reschke wrote:
> 
> Robert Collins wrote:
> 
>> On Thu, 2005-08-04 at 12:19 +0200, Julian Reschke wrote:
>>
>>> Robert Collins wrote:

>>> That being said; the *best* way (if you control the server) to embed 
>>> non ASCII data into a URI usually is to UTF-8 encode, then 
>>> percent-escape.
>>
>> What I'm asking for is current RFC or std document that specifies this.
>> The reference for Location is because Location is the header I currently
>> need to ensure correct encoding of, and its definition is 'absoluteURI'
>> in rfc2616, which is incorporated from ... and thus we trace through to
> 
> The rules for the URI in the Location header are exactly the same as for 
> the HTTP request itself, so I'm still not sure where there would be a 
> difference.
> 
>> std66 which passes the buck for the canonical pre-percent-escape
>> encoding back to the standard defining the URI scheme, which is rfc2616.
>> Full circle and no stance taken.
> 
> There's no single encoding that will work for any server. There may be 
> RFCs that recommend UTF-8 (possibly RFC3987), but these do not 
> normatively effect RFC2616-compliant servers.

First, rfc2616 predates std66, which doesn't override the conclusions of
that rfc; section 3.2.3 spells it out, to a webserver the URI is an
opaque series of octets with a few specific exceptions.  The webserver
has no opinion.

That said; for example WinNT's filesystem is truly unicode, which Apache
2.0, for example, treats as a utf-8 filesystem for resource names.  The
typical *nix system today may in fact use utf-8 file names, but does
not enforce them (they remain opaque octets to the posix layer).  It's
entirely up to the implementor what to serve based on a URI.

Convention is developing around utf-8 URI names, because no header in
the http spec further defines how to interpret the URI (the request or
response body, yes; the URI, no.)  But again, in HTTP/1.1 do as you
like as long as you escape the non-ascii and reserved codes.  Std66
didn't add an explicit charset for us, and it's unlikely until yet
another rfc for HTTP.

Bill

Received on Monday, 8 August 2005 10:16:48 UTC