W3C home > Mailing lists > Public > ietf-http-wg@w3.org > July to September 2005

Re: location uri, ucs and the http scheme definition.

From: William A. Rowe, Jr. <wrowe@rowe-clan.net>
Date: Mon, 08 Aug 2005 05:56:01 -0500
Message-Id: <6.2.1.2.2.20050808054819.071e08a0@pop3.rowe-clan.net>
To: Julian Reschke <julian.reschke@gmx.de>
Cc: Robert Collins <robertc@robertcollins.net>, HTTP Working Group <ietf-http-wg@w3.org>

At 05:24 AM 8/8/2005, Julian Reschke wrote:
>William A. Rowe, Jr. wrote:
>
>>>There's no single encoding that will work for any server. There may be RFCs that recommend UTF-8 (possibly RFC3987), but these do not normatively effect RFC2616-compliant servers.
>>
>>First, rfc2616 predates std66, which doesn't override the conclusions of
>>that rfc; section 3.2.3 spells it out, to a webserver the URI is an
>>opaque series of octets with a few specific exceptions.  The webserver
>>has no opinion.
>
>RFC2616 normatively refers to RFC2396 for the definitions of URI components (<http://greenbytes.de/tech/webdav/rfc2616.html#rfc.section.3.2.1>). And RFC2396 did not allow non-ASCII characters in URIs, either.

Julian stop it already; we -know- that only non-reserved, 
ASCII characters can be transmitted over the wire.  The
question Robert raises is what charset is normative prior
to it being encoded for transmission over the wire.  2396
says nothing that the %-escape decoded value must be be ASCII
for all components, quoting RFC 2396 section 2;

  "

Within a URI, characters are either used as delimiters, or to
   represent strings of data (octets) within the delimited portions.
   Octets are either represented directly by a character (using the US-
   ASCII character for that octet [ASCII]) or by an escape encoding."


ahhh... "or by an escape encoding" - which is all Robert has asked
for since the conversation started ;-)  Yet - we still don't have
a definitive charset, it's opaque octet data.

>>That said; for example WinNT's filesystem is truly unicode, which Apache
>>2.0, for example, treats as a utf-8 filesystem for resource names.  The
>>typical *nix system today may in fact use utf-8 file names, but does
>>not enforce them (they remain opaque octets to the posix layer).  It's
>>entirely up to the implementor what to serve based on a URI.
>
>Yes. That's a problem. See <http://greenbytes.de/tech/webdav/draft-reschke-webdav-url-constraints-latest.html> for a work-in-progress attempt to fix things at least for WebDAV.

Ack :)  The more comprehensive solution of course, HTTP/1.2, 
although I know some have their hearts set on HTTP-NG first.

Bill
Received on Monday, 8 August 2005 10:58:48 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Friday, 27 April 2012 06:49:40 GMT