Re: Proposal for new introductory article/tip

John O'Conner wrote:
> The attached document is a preliminary suggestion for a document that 
> describes a common problem between form data and servers that use that 
> data. 
ّHi John,

It seems that not all browser have the same behavior.
Does it depends also (not only on the whole document charset or the form 
charset attribute) but also on the "View>Charset encoding" setting of 
your browser.

A rapid test I made.  With an arabic field, firefox sends Unicode 
codepoints (e.g. دبي) if it is set to 
Western(ISO-8859-1). If set to utf-8, firefox send utf-8 encoded arabic 
field.

Set to default or Western(ISO-8859-1), Safari send "???" unkown chars? 
as the value of arabic form field. To have the correct value, this 
should be set to utf-8.

More tests should be done? Especially with different combinations of 
html source charset settings? Feedback later on this.

Regards, Najib

> The document is not final, not even close. However, let's use this to 
> begin a discussion.
>
> Regards,
> John O'Conner
>
>
> ------------------------------------------------------------------------
>
>
>   Charsets in HTML Forms
>
> Modern web browsers encode HTML form data in the charset encoding of 
> the form's charset attribute. If the form has no explicit charset 
> attribute, browsers use the enclosing page's charset attribute 
> instead. When users submit form data to servers, browsers provide that 
> data in either an HTTP GET or POST command. In either case, the 
> browser encodes the data before sending it to the server.
>
> This page describes a potential problem between the interaction of 
> encoded form data and the servers that interpret or decode that data. 
> Solutions will also be provided.
>
>
>     Problem Description
>
> One common problem is the misinterpretation of form data on the server 
> side. When retrieving form data, servers sometimes decode form data 
> using the incorrect charset encoding. When the server retrieves form 
> data using the wrong charset encoding, the result is garbled, lost data.
>
>
>     Causes
>
> Nothing in the encoded data itself communicates the charset encoding 
> of that data. Server code that parses data often makes incorrect 
> assumptions about encodings. For example, even though your browser 
> sends UTF-8 encoded data, the server may mistakenly use the ISO-8859-1 
> charset to decode the data.
>
>
>     Problem Examples
>
> From Richard:
>
> Problem scenarios (in each case explain, using an example, how the 
> characters are sent and the potential issues, and recommend best 
> practices):
>
>   1.
>
>       the user types into the form field characters that are not in
>       the encoding of the document
>
>   2.
>
>       some process adds characters to the form that are not in the
>       encoding of the document
>
>   3.
>
>       the server doesn't know the encoding of the data it is receiving
>
>   4.
>
>       the server is receiving data in multiple encodings
>
>   5.
>
>       the user changes the encoding of the document manually before
>       the form data is sent
>
>
>     Solutions
>
> Describe a couple solutions:
>
>   1.
>
>       Provide charset encoding as a hidden form field that the server
>       can read and apply to remaining fields
>
>   2.
>
>       Standardize on a single charset, UTF-8.
>
>
>
>
>


-- 
Najib TOUNSI (mailto:tounsi @ w3.org)
Bureau W3C au Maroc (http://www.w3c.org.ma/)
Ecole Mohammadia d'Ingenieurs, BP 765 Agdal-RABAT Maroc (Morocco)
Phone : +212 (0) 37 68 71 50 (P1711)  Fax : +212 (0) 37 77 88 53
Mobile: +212 (0) 61 22 00 30 

Received on Wednesday, 21 February 2007 22:14:02 UTC