W3C home > Mailing lists > Public > www-international@w3.org > April to June 1998

Re: unicode

From: Erik van der Poel <erik@netscape.com>
Date: Tue, 02 Jun 1998 15:01:36 -0700
Message-ID: <35747640.B1B1450@netscape.com>
To: Aman Choudhary <aman@asu.edu>
CC: www-international@w3.org
Hello Aman,

The HTML 4.0 spec mentions the accept-charset attribute:

http://www.w3.org/TR/REC-html40/interact/forms.html#adef-accept-charset

However, Netscape has not implemented this yet. Don't know about MSIE.

If you want to be able to receive input from a large number of current
users, your best bet might be to use "traditional" charsets
(non-Unicode) in the HTML forms, and then to convert to Unicode on the
server side (CGI side).

HTML's meta tag is one way to indicate the charset, but it is known to
be problematic in some installed clients. It would be more reliable if
you indicated the charset in the HTTP Content-Type header. (I'm assuming
you're using HTTP.)

One easy way to emit the HTTP charset is to use a CGI script to emit the
form itself. For example, if you are using Unix, you might write:

#!/bin/sh
echo 'Content-Type: text/html; charset=gb2312'
echo
cat chinese-form.html

I used Chinese as an example, since you mentioned Chinese. The charset
label "gb2312" is for "Simplified Chinese".

You would need another CGI to receive the form submission.

An alternative is to write the HTML form in Unicode (UTF-8), and to
label the charset accordingly, so that the client converts the user's
text to Unicode before submitting the form to the server. But Unicode
support is relatively new in the clients, and you may not have as much
luck with this method.

Also, I don't know if MSIE supports the charset name "gb2312". One would
have to check.

Sigh. I know this isn't easy. It ought to be easier than this. Let me
know if I can be of further assistance with HTTP labelling, charset
names, etc.

Erik van der Poel
Netscape

Aman Choudhary wrote:
> 
> I still havent found out a way to store information, which I retrieve from
> the internet in unicode and not in ascii, which means that I can get
> information in
> practically any language from the internet.
> 
> I have installed all kinds of input method editor, which allows one to input
> data in chinese and other languages. But still I havent found a connection
> of Unicode in it.
> 
> What I really want to do-
> 
> |  < meta HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=" *">
> |   * - appropriate ISO code for that language
> |        <input type = textbox>
>                         | result (CGI/ASP)
>                         V
>             The text box value stored as UNICODE
Received on Tuesday, 2 June 1998 18:01:50 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:16:52 GMT