Re: html forms accept-charset

Hello Tex,

At 14:49 03/09/29 -0400, Tex Texin wrote:

>Hi,
>
>I tested MS IE 6.02800 with a page of meta charset=
>
>1) windows-1252 and
>2) utf-8
>3) user-override
>
>and a form with accept-charset=utf-8.
>I tried both GET and POST of the form.
>
>I found that:
>
>A) Under no circumstances does IE send the accept-charset

Do you want to say 'send data encoded in the encoding of accept-charset',
or what?


>or a content-type
>with charset=.
>
>B) That IE converts text data as a string of bytes, and values above 127 are
>sent with
>%hh encoding.

This is what the URI spec requires.


>If the page had charset of 1252, it sent text as 1252 bytes.
>If the page had charset of utf-8, IE sent text as utf-8 bytes.
>If the page was overriden with a new charset, it used the new charset!
>(so much for send back what you receive.)

As I argued on another list, people don't really override, unless
they have a strong reason to (avoid garbage on the screen), in
which case the browser is actually sending back what it received,
just not what it was told it was receiving.


>The accept-charset in the form did nothing.
>GET and POST behaved the same way with respect to text encoding.
>
>So, at least for IE, it always sends back the text of the form
>in the current encoding being viewed by the user (whether it was set by http,
>meta, or user override).
>
>I don't intend to test other browsers.
>
>I think the recommendation is to not use accept-charset in forms, as it just
>adds confusion.

I'm not sure. On www-international, we had reports that for
some browsers, it can increase the likelyhood of getting the
right thing back. That would mean:

- Use accept-charset to protect against the user changing
   the page encoding (on some browsers)
- Only use an accept-charset with the same encoding as that
   of the page, don't try to ask browsers to send data back
   in another encoding.


>I think it becomes important to embed a value that can be used to check if the
>user changed the expected encoding.

Only if that's a scenario you have to protect against,
which I think isn't a realistic scenario.


Regards,    Martin.



>comments?
>tex
>
>
>Tex Texin wrote:
> >
> > I am writing about html and encodings and I notice that most references 
> on the
> > i18n site and elsewhere gloss over encodings declaration with respect 
> to forms.
> >
> > Our faq on multilingual forms doesn't talk about the declaration. Nor 
> does the
> > workgroup charset page.
> > (They were among the first places I looked for a guideline.)
> >
> > What we do say is we recommend using utf-8 and hope that browsers give back
> > what is sent out.
> > We don't mention that forms can specify an accept-charset and whether they
> > should (in addition to the above) or not.
> >
> > The standard says:
> >
> > "The default value for this attribute is the reserved string "UNKNOWN". 
> User
> > agents may interpret this value as the character encoding that was used to
> > transmit the document containing this FORM element."
> >
> > Perhaps this is good enough, although we know that the browser for a 
> number of
> > reasons can have the wrong or at least a different encoding than the author
> > intended. I suspect it is better to
> > make it explicit.
> >
> > So, my question is:
> > 1) Should we be more concrete about accept-charset in the <form> 
> statement, and
> > should we:
> >
> > a) recommend not using it and just go with what goes around comes around?
> >
> > b) recommend using it? if so with what values?
> > utf-8 only? utf-8 higher than others on the list?
> >
> > 2) Do browsers follow the accept-charset? ie. Will they always convert 
> to utf-8
> > if that is specified?
> > (even if the page was in another charset?)
> >
> > I am asking if the following example is recommended (note accept-charset):
> >
> > <FORM action="http://www.xencraft.com" method="post" 
> accept-charset="UTF-8">
> >     <P>
> >     Name: <INPUT type="text" name="firstname"><BR>
> >     <INPUT type="submit" value="Send"> <INPUT type="reset">
> >     </P>
> >  </FORM>
> >
> > or better to use accept-charset="UTF-8;q=1.0,*;q=0.5" ?
> >
> > 3) Has anyone done any testing around this?
> >
> > 4) Someone (after it is answered) want to convert this to a faq?
> > Add it to the guidelines doc?
> >
> > tex
> >
> > --
> > -------------------------------------------------------------
> > Tex Texin   cell: +1 781 789 1898   mailto:Tex@XenCraft.com
> > Xen Master                          http://www.i18nGuy.com
> >
> > XenCraft                            http://www.XenCraft.com
> > Making e-Business Work Around the World
> > -------------------------------------------------------------
>
>--
>-------------------------------------------------------------
>Tex Texin   cell: +1 781 789 1898   mailto:Tex@XenCraft.com
>Xen Master                          http://www.i18nGuy.com
>
>XenCraft                            http://www.XenCraft.com
>Making e-Business Work Around the World
>-------------------------------------------------------------

Received on Wednesday, 1 October 2003 11:30:54 UTC