W3C home > Mailing lists > Public > ietf-http-wg-old@w3.org > January to April 1996

Re: html, http, urls and internationalisation

From: BearHeart / Bill Weinman <bearheart@bearnet.com>
Date: Sun, 28 Jan 1996 17:49:39 -0600
Message-Id: <2.2.32.19960128234939.006cd888@204.145.225.20>
To: Francois Yergeau <yergeau@alis.ca>, http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com

   [ my mailer was doing that stupid "quoted-printable" thing=20
     again, which it doesn't even decode on receipt! I thought=20
     I had shut that off a long time ago, but anyway here it is=20
     again in plain ISO-8859-1. Sorry for the duplication.=20
                                                     --BearHeart ]

At 03:32 pm 1/28/96 -0500, Francois Yergeau spake:
>  "http://www.alis.com/~Fran=E7ois"=20

   Hmmm... I got this:

404 Pas trouv=E9

L'URL demand=E9 /~Fran=E7ois est introuvable sur ce serveur.

   But I seem to be getting the c-cedilla okay.=20

   What character set is it expecting?=20

>Personally, I like the implicit UTF-8 idea: any non-ASCII character
>must be sent to a server as its UTF-8 encoding, either URL-encoded

   That leaves out a large segment of the world. Frankly, I don't=20
think we can get very far with any 8-bit system. Even if we discount=20
the languages with more than 100 or so characters, we're still stuck=20
once we try to handle more than two or three--greek, cyrillic,=20
semitic/arabic, english--too many characters already.=20

 ---=20

SUGGESTION FOR HANDLING HTTP REQUESTS ONLY:=20

   How about an optional single-octet, represented in decimal ascii,=20
that specifies a character-set. Register a number of them with IANA,=20
and then it's up to the server to be able to interpret those that=20
are applicable to the services it handles locally.=20

   If there is no octet specified, the server defaults to 7-bit=20
ascii.=20

   The ordinal value of the octet could be loosely-tied to the=20
numeric country codes already in use for a number of other purposes.=20

   So, if the first field of a request is numeric, e.g.:=20

033 GET /~Fran=E7ois HTTP/1.1

   The server knows that this request is using character-set=20
number "33", which would of course have a common representation=20
for c-cedilla, and voil=E0! everyone knows who's saying what!

 ---

   BTW, the mail header to your message had this:=20

Mime-Version: 1.0
Mime-Version: 1.0
Mime-Version: 1.0
Mime-Version: 1.0
Mime-Version: 1.0
Mime-Version: 1.0
Mime-Version: 1.0

   Does that make it version 7.0?


+--------------------------------------------------------------------------+
| BearHeart / Bill Weinman | BearHeart@bearnet.com | http://www.bearnet.com/=
=20
| Author of The CGI Book -- http://www.bearnet.com/cgibook/=20
Received on Sunday, 28 January 1996 15:55:02 EST

This archive was generated by hypermail pre-2.1.9 : Wednesday, 24 September 2003 06:31:43 EDT