Re: location uri, ucs and the http scheme definition. from Jamie Lokier on 2005-08-22 (ietf-http-wg@w3.org from July to September 2005)

From: Jamie Lokier <jamie@shareable.org>
Date: Mon, 22 Aug 2005 14:20:08 +0100
To: Robert Collins <robertc@robertcollins.net>
Cc: "William A. Rowe, Jr." <wrowe@rowe-clan.net>, Julian Reschke <julian.reschke@gmx.de>, HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <20050822132008.GB4461@mail.shareable.org>

Robert Collins wrote:
> > Ack :)  The more comprehensive solution of course, HTTP/1.2, 
> > although I know some have their hearts set on HTTP-NG first.
> 
> I'd be happy with a HTTP/1.1 errata that updates the http:// scheme to
> declare it as utf8 before the escape encoding is done.

Not reasonable.

There are a significant number of HTTP/1.1-compliant servers which
work with URLs that are derived from text in other encodings, and
there are servers where the encoding depends on the URL (because the
server's job is to pass along the URL unmodified to individual
resource handlers).

Because of that, proxies must continue to work with URIs that contain
arbitrary %-escaped sequences, without filtering or changing them
(even if they don't represent valid UTF-8), servers must continue to be
able to serve documents containing such URIs, and clients must
continue to be able to retrieve documents using those URIs.

In principle, the escape-encoding represents an application-specific
opaque octet stream, and it need not represent "characters" at all.

An appropriate place to define UTF-8 as the encoding to use would be
in document standards, such as XML and HTML, as this question really
is about how to convert character sequences (in documents and user
interfaces) that feature non-ASCII characters and purport to be URIs
(but aren't really URIs) into well-formed URIs for network operations.

The place where it's useful to specify a character encoding are:

    - How non-ASCII characters in documents in places such as an
      "href" attribute are converted into proper URIs for HTTP.

    - How non-ASCII characters in forms are converted into proper
      URI query parts.  (This is covered somewhat already in HTTP 4).

    - How non-ASCII characters in other parts of a typical client's
      user interface such as the "location bar", are converted into
      proper URLs for HTTP document retrieval.

-- Jamie

Received on Monday, 22 August 2005 13:20:31 UTC