Re: revised "generic syntax" internet draft

Keld J|rn Simonsen (keld@dkuug.dk)
Tue, 22 Apr 1997 13:06:31 +0200


Message-Id: <199704221106.NAA15049@dkuug.dk>
From: keld@dkuug.dk (Keld J|rn Simonsen)
Date: Tue, 22 Apr 1997 13:06:31 +0200
In-Reply-To: "Martin J. Duerst" <mduerst@ifi.unizh.ch>
To: "Martin J. Duerst" <mduerst@ifi.unizh.ch>
Subject: Re: revised "generic syntax" internet draft
Cc: John C Klensin <klensin@mci.net>, Dan Oscarsson <Dan.Oscarsson@trab.se>,

"Martin J. Duerst" writes:

> On Wed, 16 Apr 1997, Keld J|rn Simonsen wrote:
> 
> > John Klensin writes about use of UTF-8 and penalties in size 
> > and readability for various user communities. Some remarks:
> 
> > Maybe John wants to be able to use other charsets for encoding
> > an URL. I actually proposed some time ago a solution labelling
> > the encoding of the URL in a "URL-charset:" header and a
> > having UTF-8 as default, and I remember somebody else also proposing
> > charset labelling - on the URL line. I have not at this time evaluated 
> > such proposals compared to Martin and Frangois's proposals, but it
> > is clear that the intended functionality is the same - and my old
> > proposal could be seen as an extension to Martin/Frangois - but I
> > am not sure it is necessary.
> 
> In particular, the "FORM-UTF8: Yes" I proposed is very similar
> to your proposal. To be able to label arbitrary "charset"s is
> an extension, but I don't think it is needed at this stage of
> ISO 10646 and Internet development. The way I put it usually
> is that currently, we have "chaos". There is no need to proceed
> to "labeled chaos" when we can proceed to "order" directly.
> The Universal Character Set really shows off its strength most
> directly for short and widely used strings such as URLs.

My "URL-Charset:" header also goes along the "labelled chaos" that
we already have with HTML, and then the coding of URLs in
anchors etc in the HTML markup. The natural thing there is that URLs
are encoded in the charset of the HTML document. So a request
for the URL would then have a header with the URL and then the
"URL-charset" of the HTML document. Straightforward. And we could
use equivalent mechanisms whether the URL was typed in or came from
a HTML document. Also the responsibility of handling the character
encoding incl conversion would be at the server side, which normally
would be the "offender" allowing strange things like non-ASCII URLs.

Actually the labelling would make it possible to solve the charset
issue two places, both at the client and the server side. 
I agree that UTF-8 should be the recommendation.

Keld