Re: location uri, ucs and the http scheme definition. from Julian Reschke on 2005-08-08 (ietf-http-wg@w3.org from July to September 2005)

From: Julian Reschke <julian.reschke@gmx.de>
Date: Mon, 08 Aug 2005 12:24:22 +0200
To: "William A. Rowe, Jr." <wrowe@rowe-clan.net>
CC: Robert Collins <robertc@robertcollins.net>, HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <42F732D6.7050006@gmx.de>

William A. Rowe, Jr. wrote:
> 
> Julian Reschke wrote:
> 
>>
>> Robert Collins wrote:
>>
>>> On Thu, 2005-08-04 at 12:19 +0200, Julian Reschke wrote:
>>>
>>>> Robert Collins wrote:
> 
> 
>>>> That being said; the *best* way (if you control the server) to embed 
>>>> non ASCII data into a URI usually is to UTF-8 encode, then 
>>>> percent-escape.
>>>
>>>
>>> What I'm asking for is current RFC or std document that specifies this.
>>> The reference for Location is because Location is the header I currently
>>> need to ensure correct encoding of, and its definition is 'absoluteURI'
>>> in rfc2616, which is incorporated from ... and thus we trace through to
>>
>>
>> The rules for the URI in the Location header are exactly the same as 
>> for the HTTP request itself, so I'm still not sure where there would 
>> be a difference.
>>
>>> std66 which passes the buck for the canonical pre-percent-escape
>>> encoding back to the standard defining the URI scheme, which is rfc2616.
>>> Full circle and no stance taken.
>>
>>
>> There's no single encoding that will work for any server. There may be 
>> RFCs that recommend UTF-8 (possibly RFC3987), but these do not 
>> normatively effect RFC2616-compliant servers.
> 
> 
> First, rfc2616 predates std66, which doesn't override the conclusions of
> that rfc; section 3.2.3 spells it out, to a webserver the URI is an
> opaque series of octets with a few specific exceptions.  The webserver
> has no opinion.

RFC2616 normatively refers to RFC2396 for the definitions of URI 
components 
(<http://greenbytes.de/tech/webdav/rfc2616.html#rfc.section.3.2.1>). And 
RFC2396 did not allow non-ASCII characters in URIs, either.

> That said; for example WinNT's filesystem is truly unicode, which Apache
> 2.0, for example, treats as a utf-8 filesystem for resource names.  The
> typical *nix system today may in fact use utf-8 file names, but does
> not enforce them (they remain opaque octets to the posix layer).  It's
> entirely up to the implementor what to serve based on a URI.

Yes. That's a problem. See 
<http://greenbytes.de/tech/webdav/draft-reschke-webdav-url-constraints-latest.html> 
for a work-in-progress attempt to fix things at least for WebDAV.

> ...

Best regards, Julian

Received on Monday, 8 August 2005 10:24:40 UTC