Re: Form response charset

Just to add to the info already posted:

How I handle this is to pass a hidden value which is the charset named in the
META tag, e.g.

<HTML>
<HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=UTF-8">
</HEAD>
<BODY>
<FORM ACTION="http://mywebserver/cgi-bin/mycgiprogram" METHOD="POST">
<INPUT TYPE="hidden" NAME="myformcharset" VALUE="UTF-8">
</FORM>
</BODY>
</HTML>

But know this:  Input text fields are handled by the _native_ system.  That is,
even though I'm on an en_US system, I can look at Japanese Web pages because I
have the appropriate fonts defined to my browsers.  However, if I want to type
text into a Japanese (EUC-JP) form, I cannot type Japanese so that it _looks_
like Japanese.  If I know the Latin1 (ISO-8859-1) equivalents for the Japanese I
want to type, I can fake the system out, type the equivalents, and the browser
will convert the data into the Japanese charset (EUC-JP) in this case.  By the
same token, if you pre-fill the input field with Japanese text, it will look
like garbage if the native system doesn't handle Japanese.  But once you submit
the form, the data will be OK.

I've seen Netscape and IE both behave this way.

One more interesting note - if you translate ALT text into anything besides
Latin1, as a tool tip (the pop-up yellow box) it will be garbage.  In Netscape
(this is 4.0x) if the image doesn't display at all, the non-Latin1 ALT text
displays properly; but in IE 4.x, the ALT text still will be garbage.  I believe
it's a font issue for the pop-up boxes in the case of Netscape.  I haven't
tested this in Netscape 4.5, nor IE 5.0.

Regards,
Andrea
-- 
Andrea Vine
Sun Internet Mail Server i18n architect
avine@eng.sun.com
Remember: stressed is desserts spelled backwards.

Klaus Weide wrote:
> 
> On Wed, 14 Apr 1999, Jason Pouflis wrote:
> 
> > Browsers then did not submit the charset encoding along with data
> > nor could I find a pre-fabricated solution for best guessing encoding type.
> > This may have changed, please forward useful responses or your summary.
> >
> > wrt to testing on different browsers, I found that although my
> > utf-8 pages would display properly on
> > IE4 (english + japanese IME) on Win95/NT (english),
> > that they didn't display properly on
> > IE4 (japanese) on Win95 (japanese).
> >
> >
> > A response I got on 13 May 1998 from Roman Czyborra was:
> > ==============================================
> > > How do I tell what character set form data is submitted in?
> >
> > There is a discussion of this issue in section 5 of RFC 2070.
> > Ideally, the client sends something like
> >
> > Content-Type: application/x-www-form-urlencoded; charset=UTF-8
> >
> > In practice, most browsers don't send the charset parameter and leave
> > you to guessing what the data might be supposed to mean.
> > Even Lynx 2-8-2 en Netscape 4.04 don't send it.
> 
> Actually, Lynx (2-8-2 and some earlier versions) *is* able to send the
> charset parameter if appropriate.  It just doesn't always do it, in
> order to not confuse existing scripts.  But in a form with an
> ACCEPT-CHARSET="utf-8" attribute, AND the submission data actually
> containing non-US-ASCII characters, you should see the charset
> parameter being sent.
> 
>    Klaus

Received on Thursday, 15 April 1999 14:18:24 UTC