Re: proposed charter items for possible URI working group from William A. Rowe, Jr. on 2002-07-20 (uri@w3.org from July 2002)

From: William A. Rowe, Jr. <wrowe@rowe-clan.net>
Date: Sat, 20 Jul 2002 18:50:40 -0500
To: "Chris Haynes" <chris@harvington.org.uk>
Cc: <uri@w3.org>, <www-i18n-comments@w3.org>
Message-Id: <5.1.0.14.2.20020720184310.03354ea0@pop3.rowe-clan.net>

At 02:13 PM 7/20/2002, Chris Haynes wrote:

>  "Larry Masinter" wrote:
> >
> > I propose to NOT fold in the IRI definition, but to allow
> > it to proceed along standards track at its own pace. The
> > revised URI standard can note the IRI work as a separate
> > effort.
>
>I don't want to pre-empt the IRI experts, but I have a hunch that
>giving the URI %HH escapes a formal definition of representing UTF-8
>(rather than the present undefined encoding) would help the IRI
>activity enormously. My guess is it would leave them free to define a
>canonic mapping of IRIs into URIs without requiring any further URI
>spec. changes.

There is an underlying flaw in that concept.  Many browsers and servers
today are passing or inferring non-utf-8 input from the URI or header fields.
Many are passing utf-8 data.  As the specification makes no preference
or definition, the server is responsible for inferring some meaning by the
context of the request by the client, and returning the response in kind.

>If I'm right, would this simple change be out of scope?

Making such a change by fiat would be inappropriate.  Some additional
information has to be passed by the client to preempt any inference of
the high-bit octet codes.

If this were a change to the HTTP/1.2 specification, that would be the
indicator that all headers and the URI itself are utf-8 encoded.  Without
a bump of the HTTP version number, it's entirely out of scope.

I'm not clear here about Larry's intent, so I will ask;

Larry, did you mean for the next document revision to spell out an
HTTP/1.2 conformance?  If not, is Chris Haynes' proposal out of line,
if he introduces the appropriate header field definition to assert that
the client and server treat the now-opaque octets %80-%FF as utf-8?

Bill

Received on Saturday, 20 July 2002 19:53:12 UTC