- From: Martin Duerst <duerst@w3.org>
- Date: Wed, 01 Oct 2003 11:29:48 -0400
- To: Tex Texin <tex@i18nguy.com>, GEO <public-i18n-geo@w3.org>
Hello Tex, At 14:49 03/09/29 -0400, Tex Texin wrote: >Hi, > >I tested MS IE 6.02800 with a page of meta charset= > >1) windows-1252 and >2) utf-8 >3) user-override > >and a form with accept-charset=utf-8. >I tried both GET and POST of the form. > >I found that: > >A) Under no circumstances does IE send the accept-charset Do you want to say 'send data encoded in the encoding of accept-charset', or what? >or a content-type >with charset=. > >B) That IE converts text data as a string of bytes, and values above 127 are >sent with >%hh encoding. This is what the URI spec requires. >If the page had charset of 1252, it sent text as 1252 bytes. >If the page had charset of utf-8, IE sent text as utf-8 bytes. >If the page was overriden with a new charset, it used the new charset! >(so much for send back what you receive.) As I argued on another list, people don't really override, unless they have a strong reason to (avoid garbage on the screen), in which case the browser is actually sending back what it received, just not what it was told it was receiving. >The accept-charset in the form did nothing. >GET and POST behaved the same way with respect to text encoding. > >So, at least for IE, it always sends back the text of the form >in the current encoding being viewed by the user (whether it was set by http, >meta, or user override). > >I don't intend to test other browsers. > >I think the recommendation is to not use accept-charset in forms, as it just >adds confusion. I'm not sure. On www-international, we had reports that for some browsers, it can increase the likelyhood of getting the right thing back. That would mean: - Use accept-charset to protect against the user changing the page encoding (on some browsers) - Only use an accept-charset with the same encoding as that of the page, don't try to ask browsers to send data back in another encoding. >I think it becomes important to embed a value that can be used to check if the >user changed the expected encoding. Only if that's a scenario you have to protect against, which I think isn't a realistic scenario. Regards, Martin. >comments? >tex > > >Tex Texin wrote: > > > > I am writing about html and encodings and I notice that most references > on the > > i18n site and elsewhere gloss over encodings declaration with respect > to forms. > > > > Our faq on multilingual forms doesn't talk about the declaration. Nor > does the > > workgroup charset page. > > (They were among the first places I looked for a guideline.) > > > > What we do say is we recommend using utf-8 and hope that browsers give back > > what is sent out. > > We don't mention that forms can specify an accept-charset and whether they > > should (in addition to the above) or not. > > > > The standard says: > > > > "The default value for this attribute is the reserved string "UNKNOWN". > User > > agents may interpret this value as the character encoding that was used to > > transmit the document containing this FORM element." > > > > Perhaps this is good enough, although we know that the browser for a > number of > > reasons can have the wrong or at least a different encoding than the author > > intended. I suspect it is better to > > make it explicit. > > > > So, my question is: > > 1) Should we be more concrete about accept-charset in the <form> > statement, and > > should we: > > > > a) recommend not using it and just go with what goes around comes around? > > > > b) recommend using it? if so with what values? > > utf-8 only? utf-8 higher than others on the list? > > > > 2) Do browsers follow the accept-charset? ie. Will they always convert > to utf-8 > > if that is specified? > > (even if the page was in another charset?) > > > > I am asking if the following example is recommended (note accept-charset): > > > > <FORM action="http://www.xencraft.com" method="post" > accept-charset="UTF-8"> > > <P> > > Name: <INPUT type="text" name="firstname"><BR> > > <INPUT type="submit" value="Send"> <INPUT type="reset"> > > </P> > > </FORM> > > > > or better to use accept-charset="UTF-8;q=1.0,*;q=0.5" ? > > > > 3) Has anyone done any testing around this? > > > > 4) Someone (after it is answered) want to convert this to a faq? > > Add it to the guidelines doc? > > > > tex > > > > -- > > ------------------------------------------------------------- > > Tex Texin cell: +1 781 789 1898 mailto:Tex@XenCraft.com > > Xen Master http://www.i18nGuy.com > > > > XenCraft http://www.XenCraft.com > > Making e-Business Work Around the World > > ------------------------------------------------------------- > >-- >------------------------------------------------------------- >Tex Texin cell: +1 781 789 1898 mailto:Tex@XenCraft.com >Xen Master http://www.i18nGuy.com > >XenCraft http://www.XenCraft.com >Making e-Business Work Around the World >-------------------------------------------------------------
Received on Wednesday, 1 October 2003 11:30:54 UTC