Re: Forms and CharSets from Yung-Fong Tang on 1999-09-15 (www-international@w3.org from July to September 1999)

From: Yung-Fong Tang <ftang@netscape.com>
Date: Wed, 15 Sep 1999 14:42:35 -0700
To: George Spafford <george_spafford@lionbridge.com>
CC: www-international@w3.org
Message-ID: <37E012CA.6E10384E@netscape.com>
The name "accept-charset" itself is very misleading. The origion of the name
"Accept-Charset" is from HTTP 1.1 protocol. The Accept-Charset in the HTTP is
send out by client to server to indidcate which charset the client could
handle.

Somehow the "Accept-Charset" get put into HTML 4.0 with funny statement-

> accept-charset = charset list [CI]
>      This attribute specifies the list of character encodings for input data that must be accepted by the server processing this form. The
>      value is a space- and/or comma-delimited list of charset values. The server must interpret this list as an exclusive-or list, i.e., the
>      server must be able to accept any single character encoding per entity received.
>
>      The default value for this attribute is the reserved string "UNKNOWN". User agents may interpret this value as the character
>      encoding that was used to transmit the document containing this FORM element.
>
The reason I say it is a "funny statment" is because while the client can tell
the server what charset it could accept, it is not reasonable for a form (which
may site in site A, B, or C) to tell the client what charset the CGI (which may
located in site D) could accept. Also, it should be the Client software to
interpreet the list here, how can the server interprete that list? Since there
are no word mention about the client, client could ignore this field and still
implement the spec.

The "accept" really mean what the SERVER could accept here (in the case of HTML
form, not HTTP 1.1). Therefore, it does not mean the browser have to reject the
user's input since the client may accept that input while the server don't.

George Spafford wrote:

> With forms, there is the accept-charset and I am trying to understand its
> functionality a bit more.  I have a situation where users will need to
> enter data into a database that can *only* handle ISO-8859-1
> characters.  In respect to accept-charset, if a Japanese user is viewing a
> site in shift-jis and goes to enter data, what will the data entry and
> submit behavior be if accpet-charset is set to iso-8859-1?  I must
> apologize for asking this, but I'm in a crunch right now and don't have
> time to do the simulation.  I'm hoping I can leverage someone else's
> experience.
>
> Can the shift-jis user enter data at all?  What encoding will the browser
> use assuming the rest of the page is in shift-jis and accept-charset is set
> to iso-8859-1?  The underlying picture is that I want users of other
> languages to still be able to enter data into this database via am HTML
> form provided they can read/write English while the rest of the page is
> still in their native tongue.  Ideally, the entry would not bomb when they
> hit submit but instead either shift the browser to 8859-1 or alert the user
> that we can't handle the input.
>
> I can't revise the database at this time and am trying to come up with some
> workarounds.  Longer term, we will change the database structures and all
> the collateral scripts.
>
> Any thoughts?

In any way, I think this is a wrong thing to do with your particular problem.
If you care about form validation. Use JavaScript OnChange handler to scan your
data, and prompt the user if any text you don't want to see is there. For
example, you should prompt the user if s/he type in A-Za-z for a telephone
field.


>
>
> --G--
Received on Wednesday, 15 September 1999 17:44:48 UTC