Re: Forms and CharSets from Yung-Fong Tang on 1999-09-22 (www-international@w3.org from July to September 1999)

From: Yung-Fong Tang <ftang@netscape.com>
Date: Wed, 22 Sep 1999 00:21:47 -0700
To: "Martin J. Duerst" <duerst@w3.org>
CC: George Spafford <george_spafford@lionbridge.com>, www-international@w3.org, ij@w3.org
Message-ID: <37E8838A.900840FA@netscape.com>
"Martin J. Duerst" wrote:

> At 14:42 99/09/15 -0700, Yung-Fong Tang wrote:
> >>>>
>
>      The name "accept-charset" itself is very misleading.
>
> <<<<
>
> The name is not misleading.
>

If the name is not misleading. Then people won't think it is to indicate which characters the client can accept, right. The origional questoin itself prove this name is misleading. Does this mean:
1. What the server/could accept, or
2. What the intput fields display by the cleint could accept
Use this name in HTTP is not misleading. Use this name  IN HTML is misleading.

>
> >>>>
>
>      The origion of the name "Accept-Charset" is from HTTP 1.1 protocol. The Accept-Charset in the HTTP is send out by client to server to indidcate which charset the client could handle.
>
>      Somehow the "Accept-Charset" get put into HTML 4.0 with funny statement-
>
>           accept-charset = charset list [CI]&nbsp;
>           &nbsp;&nbsp;&nbsp;&nbsp; This attribute specifies the list of character encodings for input data that must be accepted by the server processing this form. The
>           &nbsp;&nbsp;&nbsp;&nbsp; value is a space- and/or comma-delimited list of charset values. The server must interpret this list as an exclusive-or list, i.e., the
>           &nbsp;&nbsp;&nbsp;&nbsp; server must be able to accept any single character encoding per entity received.&nbsp;
>
>           &nbsp;&nbsp;&nbsp;&nbsp; The default value for this attribute is the reserved string "UNKNOWN". User agents may interpret this value as the character
>           &nbsp;&nbsp;&nbsp;&nbsp; encoding that was used to transmit the document containing this FORM element.
>
>      The reason I say it is a "funny statment" is because while the client can tell the server what charset it could accept, it is not reasonable for a form (which may site in site A, B, or C) to tell the client what charset the CGI (which may located in site D) could accept.
>
> <<<<
>
> The idea is not that the form tell the server what kind of charsets
> the server has to accept, but that the form tell the browser what
> kind of charsets the server actually accepts.

But this is not reliable , right. How can the HTML spec out the HTTP server limitation/features in a reliable way. HTTP header could provide reliable information about what it could accept. It is wrong for a HTML to indicate what the server could accept. And the most funny part is the HTML 4.0 editor simply copy the text
from HTTP 1.1 spec without modify it corretly. The originoal text simply indicate the accpt-charset which the client send out. How can a HTML spec spec out something like "The server must...." ?

>
>
> While the form may come from a different server than the server
> where the CGI script is located, the author of the form has to
> have some knowledge about what the CGI script can handle, and
> assuming that it can know about what charsets the CGI script
> can handle is a reasonable extension.
>
> The text above is indeed a bit unclear, and probably should be fixed.

I agree it should be fixed.

>
>
> >>>>
>
>      Also, it should be the Client software to interpreet the list here, how can the server interprete that list? Since there are no word mention about the client, client could ignore this field and still implement the spec.
>
>      The "accept" really mean what the SERVER could accept here (in the case of HTML form, not HTTP 1.1). Therefore, it does not mean the browser have to reject the user's input since the client may accept that input while the server don't.
>
>      George Spafford wrote:
>
>           With forms, there is the accept-charset and I am trying to understand its
>           functionality a bit more. I have a situation where users will need to
>           enter data into a database that can *only* handle ISO-8859-1
>           characters. In respect to accept-charset, if a Japanese user is viewing a
>           site in shift-jis and goes to enter data, what will the data entry and
>           submit behavior be if accpet-charset is set to iso-8859-1?
>
> <<<<
>
> According to the idea behind accept-charset, the data will be submitted
> in iso-8859-1. But I am not sure how many browsers actually implement this.

I don't care what is "the idea behinid ...". It does not matter what is the idea behind once you have a standard/spec. After the spec/standard published, the important thing is what the text in the spec said, not the origional idea any more. The spec/standard should make the description clear so people read it know how
it work with clear definitation. This accept-charset feature is poorly specify in HTML 4.0

>
>
> >>>>
>
>           I must
>           apologize for asking this, but I'm in a crunch right now and don't have
>           time to do the simulation. I'm hoping I can leverage someone else's
>           experience.
>
>           Can the shift-jis user enter data at all? What encoding will the browser
>           use assuming the rest of the page is in shift-jis and accept-charset is set
>           to iso-8859-1? The underlying picture is that I want users of other
>           languages to still be able to enter data into this database via am HTML
>           form provided they can read/write English while the rest of the page is
>           still in their native tongue. Ideally, the entry would not bomb when they
>           hit submit but instead either shift the browser to 8859-1 or alert the user
>           that we can't handle the input.
>
> <<<<
>
> An easy solution assuming the currently most widely deployed browsers is
> the following:
>
> - Use two pages.

You can actually use two FRAME, or probably two layer.

>
>
> - The first page is in Japanese, and explains things.
>
> - The actual form page is in iso-8859-1 (make sure that your server/page
> contain this info, or the browser may assume it's in shift-jis or so).
> Most browsers will then actually send back iso-8859-1.
>
> An alternative is to assume that your Japanese users won't use anything
> more than the ASCII subset of iso-8859-1. In that case, you can just
> use one page, and filter the results either at the browser (as Frank
> has suggested) or at the server. Which alternative to take depends
> on your application.
>
> [I would personally like if more Japanese sites accept the characters
> in iso-8859-1; this would allow me to spell out my name :-]
>
> Regards, Martin.
>
> >>>>
>
>           I can't revise the database at this time and am trying to come up with some
>           workarounds. Longer term, we will change the database structures and all
>           the collateral scripts.
>
>           Any thoughts?
>
>      In any way, I think this is a wrong thing to do with your particular problem. If you care about form validation. Use JavaScript OnChange handler to scan your data, and prompt the user if any text you don't want to see is there. For example, you should prompt the user if s/he type in A-Za-z for a telephone field.
>
>           --G--
>
>      •œŒ³‚³‚ê‚½“Y•tƒtƒ@ƒCƒ‹F"c:\program files\eudora\attach\ftang27.vcf"
>
> <<<<
>
> #-#-# Martin J. Du"rst, World Wide Web Consortium
> #-#-# mailto:duerst@w3.org http://www.w3.org
Received on Wednesday, 22 September 1999 03:22:49 UTC