Re: location uri, ucs and the http scheme definition.

Robert Collins wrote:
> On Thu, 2005-08-04 at 12:19 +0200, Julian Reschke wrote:
> 
>>Robert Collins wrote:
>>
>>>So: what is the correct encoding approach for non US-ASCII characters in
>>>HTTP URI's in Location: ?
>>
>>What does this have to do with "Location"? A URI never ever contains 
>>non-ASCII characters, no matter where it appears.
> 
> 
> Uhm. URIs do contain non-ASCII characters quite commonly. Such as
> http://example.com/~usérname

No, by definition that is not a URI. Non-ASCII characters are not 
allowed in URIs, that's why you need to escape them.

> Note that that cannot be unambiguously percent escaped from the rules
> contained in rfc2616 and std66. The missing link is the encoding to use
> before percent escaping, which you suggest a heuristic ..

Yes. Many servers use UTF-8, but in general a client cannot assume this 
is the case.

>>That being said; the *best* way (if you control the server) to embed non 
>>ASCII data into a URI usually is to UTF-8 encode, then percent-escape.
> 
> 
> What I'm asking for is current RFC or std document that specifies this.
> The reference for Location is because Location is the header I currently
> need to ensure correct encoding of, and its definition is 'absoluteURI'
> in rfc2616, which is incorporated from ... and thus we trace through to

The rules for the URI in the Location header are exactly the same as for 
the HTTP request itself, so I'm still not sure where there would be a 
difference.

> std66 which passes the buck for the canonical pre-percent-escape
> encoding back to the standard defining the URI scheme, which is rfc2616.
> Full circle and no stance taken.

There's no single encoding that will work for any server. There may be 
RFCs that recommend UTF-8 (possibly RFC3987), but these do not 
normatively effect RFC2616-compliant servers.


Best regards, Julian

Received on Monday, 8 August 2005 07:20:35 UTC