Re: html, http, urls and internationalisation

At 03:32 pm 1/28/96 -0500, Francois Yergeau spake:
>  ""=20

   Hmmm... I got this:

404 Pas trouv=E9

L'URL demand=E9 /~Fran=E7ois est introuvable sur ce serveur.

   But I seem to be getting the c-cedilla okay.=20

   What character set is it expecting?=20

>Personally, I like the implicit UTF-8 idea: any non-ASCII character
>must be sent to a server as its UTF-8 encoding, either URL-encoded

   That leaves out a large segment of the world. Frankly, I don't=20
think we can get very far with any 8-bit system. Even if we discount=20
the languages with more than 100 or so characters, we're still stuck=20
once we try to handle more than two or three--greek, cyrillic,=20
semitic/arabic, english--too many characters already.=20



   How about an optional single-octet, represented in decimal ascii,=20
that specifies a character-set. Register a number of them with IANA,=20
and then it's up to the server to be able to interpret those that=20
are applicable to the services it handles locally.=20

   If there is no octet specified, the server defaults to 7-bit=20

   The ordinal value of the octet could be loosely-tied to the=20
numeric country codes already in use for a number of other purposes.=20

   So, if the first field of a request is numeric, e.g.:=20

033 GET /~Fran=E7ois HTTP/1.1

   The server knows that this request is using character-set=20
number "33", which would of course have a common representation=20
for c-cedilla, and voil=E0! everyone knows who's saying what!


   BTW, the mail header to your message had this:=20

Mime-Version: 1.0
Mime-Version: 1.0
Mime-Version: 1.0
Mime-Version: 1.0
Mime-Version: 1.0
Mime-Version: 1.0
Mime-Version: 1.0

   Does that make it version 7.0?

| BearHeart / Bill Weinman | |
| Author of The CGI Book --

Received on Sunday, 28 January 1996 15:36:06 UTC