Re: html forms accept-charset

Hi,

I tested MS IE 6.02800 with a page of meta charset= 

1) windows-1252 and 
2) utf-8
3) user-override

and a form with accept-charset=utf-8.
I tried both GET and POST of the form.

I found that:

A) Under no circumstances does IE send the accept-charset or a content-type
with charset=.

B) That IE converts text data as a string of bytes, and values above 127 are
sent with
%hh encoding.

If the page had charset of 1252, it sent text as 1252 bytes.
If the page had charset of utf-8, IE sent text as utf-8 bytes.
If the page was overriden with a new charset, it used the new charset!
(so much for send back what you receive.)

The accept-charset in the form did nothing.
GET and POST behaved the same way with respect to text encoding.

So, at least for IE, it always sends back the text of the form
in the current encoding being viewed by the user (whether it was set by http,
meta, or user override).

I don't intend to test other browsers.

I think the recommendation is to not use accept-charset in forms, as it just
adds confusion.

I think it becomes important to embed a value that can be used to check if the
user changed the expected encoding.

comments?
tex


Tex Texin wrote:
> 
> I am writing about html and encodings and I notice that most references on the
> i18n site and elsewhere gloss over encodings declaration with respect to forms.
> 
> Our faq on multilingual forms doesn't talk about the declaration. Nor does the
> workgroup charset page.
> (They were among the first places I looked for a guideline.)
> 
> What we do say is we recommend using utf-8 and hope that browsers give back
> what is sent out.
> We don't mention that forms can specify an accept-charset and whether they
> should (in addition to the above) or not.
> 
> The standard says:
> 
> "The default value for this attribute is the reserved string "UNKNOWN". User
> agents may interpret this value as the character encoding that was used to
> transmit the document containing this FORM element."
> 
> Perhaps this is good enough, although we know that the browser for a number of
> reasons can have the wrong or at least a different encoding than the author
> intended. I suspect it is better to
> make it explicit.
> 
> So, my question is:
> 1) Should we be more concrete about accept-charset in the <form> statement, and
> should we:
> 
> a) recommend not using it and just go with what goes around comes around?
> 
> b) recommend using it? if so with what values?
> utf-8 only? utf-8 higher than others on the list?
> 
> 2) Do browsers follow the accept-charset? ie. Will they always convert to utf-8
> if that is specified?
> (even if the page was in another charset?)
> 
> I am asking if the following example is recommended (note accept-charset):
> 
> <FORM action="http://www.xencraft.com" method="post" accept-charset="UTF-8">
>     <P>
>     Name: <INPUT type="text" name="firstname"><BR>
>     <INPUT type="submit" value="Send"> <INPUT type="reset">
>     </P>
>  </FORM>
> 
> or better to use accept-charset="UTF-8;q=1.0,*;q=0.5" ?
> 
> 3) Has anyone done any testing around this?
> 
> 4) Someone (after it is answered) want to convert this to a faq?
> Add it to the guidelines doc?
> 
> tex
> 
> --
> -------------------------------------------------------------
> Tex Texin   cell: +1 781 789 1898   mailto:Tex@XenCraft.com
> Xen Master                          http://www.i18nGuy.com
> 
> XenCraft                            http://www.XenCraft.com
> Making e-Business Work Around the World
> -------------------------------------------------------------

-- 
-------------------------------------------------------------
Tex Texin   cell: +1 781 789 1898   mailto:Tex@XenCraft.com
Xen Master                          http://www.i18nGuy.com
                         
XenCraft		            http://www.XenCraft.com
Making e-Business Work Around the World
-------------------------------------------------------------

Received on Monday, 29 September 2003 14:50:28 UTC