Re: FW: what should the charset be in the response to the server from Martin Duerst on 2003-08-07 (www-international@w3.org from July to September 2003)

From: Martin Duerst <duerst@w3.org>
Date: Thu, 07 Aug 2003 11:43:51 -0400
To: "Sinha, Raj (Raj)" <rajsinha@avaya.com>, <www-international@w3.org>
Message-Id: <4.2.0.58.J.20030807112040.053b5850@localhost>

Hello Chris, others,

Many thanks for the very interesting discussion on a very
important topic. Sorry to be a bit late with my comments.

At 11:22 03/07/22 -0400, Sinha, Raj (Raj) wrote:
>Posting a response to this question that was replied only to me ... don't 
>know why. so forwarding it on behalf of Chris

>>The only response encoding you can rely on is from browser HTTP PUT 
>>commands, where one of the headers tells the server what encoding has 
>>been used..
>>
>>The encoding used in HTTP GET is undefined in the standards for 
>>characters outside of 7-bit ASCII.
>>
>>Anyone who says it is the encoding of the page is correct but misleading, 
>>as the browser's user can manually decide what that encoding is (changing 
>>whatever was declared in the transmitted page), so a web server can have 
>>no certainty about the encoding used in the %hh escapes in a GET, which 
>>is how non-ASCII is sent.
>
>Yes, a user can do that. A user can also save the page locally,
>edit it to change accept-charset or convert it to another encoding,...
>reload the page, and then fill it in and submit.
>
>In short, yes, a user can screw up if s/he wants. The average
>user has absolutely no reason to do that (unless what they
>receive is not labelled correctly and therefore does not
>display correctly, in which case it's the page author, and
>not the user, to blame.
>
>Doing everything in UTF-8 has an additional advantage (at least
>for cases where your server-side logic gives you the data as
>octets, rather than converted to characters): UTF-8 has a very
>well-known distinctive pattern, which can be checked with
>a regular expression. (see e.g.
>http://www.w3.org/International/questions/qa-forms-utf-8.html).
>So even in case the user tried to mess around with the encoding
>manually, this can be easily caught.
>
>In addition, it should be noticed that the distinction between
>GET and POST is by design reserved to distinguish between
>operations without and with side effects. For something like
>a search or database query, GET is appropriate, because the
>user can bookmark the resulting URI. For actual operations such
>as ordering something, POST is appropriate, because then
>(correctly implemented) browsers will warn the user when the
>user tries to repeat the operation (most probably just to
>again look at the same page, not to post another order).
>
>
>Regards,    Martin.

Received on Thursday, 7 August 2003 11:46:48 UTC