- From: Keld J|rn Simonsen <keld@dkuug.dk>
- Date: Tue, 22 Apr 1997 13:06:31 +0200
- To: "Martin J. Duerst" <mduerst@ifi.unizh.ch>
- Cc: John C Klensin <klensin@mci.net>, Dan Oscarsson <Dan.Oscarsson@trab.se>, Harald.T.Alvestrand@uninett.no, uri@bunyip.com, fielding@kiwi.ICS.UCI.EDU
"Martin J. Duerst" writes: > On Wed, 16 Apr 1997, Keld J|rn Simonsen wrote: > > > John Klensin writes about use of UTF-8 and penalties in size > > and readability for various user communities. Some remarks: > > > Maybe John wants to be able to use other charsets for encoding > > an URL. I actually proposed some time ago a solution labelling > > the encoding of the URL in a "URL-charset:" header and a > > having UTF-8 as default, and I remember somebody else also proposing > > charset labelling - on the URL line. I have not at this time evaluated > > such proposals compared to Martin and Frangois's proposals, but it > > is clear that the intended functionality is the same - and my old > > proposal could be seen as an extension to Martin/Frangois - but I > > am not sure it is necessary. > > In particular, the "FORM-UTF8: Yes" I proposed is very similar > to your proposal. To be able to label arbitrary "charset"s is > an extension, but I don't think it is needed at this stage of > ISO 10646 and Internet development. The way I put it usually > is that currently, we have "chaos". There is no need to proceed > to "labeled chaos" when we can proceed to "order" directly. > The Universal Character Set really shows off its strength most > directly for short and widely used strings such as URLs. My "URL-Charset:" header also goes along the "labelled chaos" that we already have with HTML, and then the coding of URLs in anchors etc in the HTML markup. The natural thing there is that URLs are encoded in the charset of the HTML document. So a request for the URL would then have a header with the URL and then the "URL-charset" of the HTML document. Straightforward. And we could use equivalent mechanisms whether the URL was typed in or came from a HTML document. Also the responsibility of handling the character encoding incl conversion would be at the server side, which normally would be the "offender" allowing strange things like non-ASCII URLs. Actually the labelling would make it possible to solve the charset issue two places, both at the client and the server side. I agree that UTF-8 should be the recommendation. Keld
Received on Tuesday, 22 April 1997 07:07:00 UTC