Re: location uri, ucs and the http scheme definition. from Robert Collins on 2005-08-23 (ietf-http-wg@w3.org from July to September 2005)

From: Robert Collins <robertc@robertcollins.net>
Date: Tue, 23 Aug 2005 01:36:24 +0000
To: Jamie Lokier <jamie@shareable.org>
Cc: "William A. Rowe, Jr." <wrowe@rowe-clan.net>, Julian Reschke <julian.reschke@gmx.de>, HTTP Working Group <ietf-http-wg@w3.org>
Message-Id: <1124761004.31081.42.camel@localhost.localdomain>

On Mon, 2005-08-22 at 14:20 +0100, Jamie Lokier wrote:
> Robert Collins wrote:
> > > Ack :)  The more comprehensive solution of course, HTTP/1.2, 
> > > although I know some have their hearts set on HTTP-NG first.
> > 
> > I'd be happy with a HTTP/1.1 errata that updates the http:// scheme to
> > declare it as utf8 before the escape encoding is done.
> 
> Not reasonable.
> 
> There are a significant number of HTTP/1.1-compliant servers which
> work with URLs that are derived from text in other encodings, and
> there are servers where the encoding depends on the URL (because the
> server's job is to pass along the URL unmodified to individual
> resource handlers).

I put it to you that this has occured because of the lack of guidance in
rfc2616. Even though we can't retroactively change the standard, adding
in the std66 recommendation as a wg recommendation would be a positive
step IMO.

> Because of that, proxies must continue to work with URIs that contain
> arbitrary %-escaped sequences, without filtering or changing them
> (even if they don't represent valid UTF-8), servers must continue to be
> able to serve documents containing such URIs, and clients must
> continue to be able to retrieve documents using those URIs.

Anything compliant with any of the uri standards must continue to work
with any % escape uri representation. Sure - but it would be nice to
document what *should* work.

> In principle, the escape-encoding represents an application-specific
> opaque octet stream, and it need not represent "characters" at all.

For URIs in general, yes. but std66 section 2.5 does provide guidance
for this...
>     - How non-ASCII characters in documents in places such as an
>       "href" attribute are converted into proper URIs for HTTP.
> 
>     - How non-ASCII characters in forms are converted into proper
>       URI query parts.  (This is covered somewhat already in HTTP 4).
> 
>     - How non-ASCII characters in other parts of a typical client's
>       user interface such as the "location bar", are converted into
>       proper URLs for HTTP document retrieval.

Which, given we started this thread on the Location header in http,
which sets the user interface location bar ... seems relevant to me.

Anyway, what I'd like to see is some reference suggesting a best
practice for http uris, if that is able to be defined. Using whatever
guidelines are present for the next http protocol would be ideal ;0.

Rob

-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.

Received on Tuesday, 23 August 2005 06:48:53 UTC