W3C home > Mailing lists > Public > www-international@w3.org > July to September 1999

Re: Forms and CharSets

From: George Spafford <george_spafford@lionbridge.com>
Date: Thu, 16 Sep 1999 08:11:38 -0400
Message-Id: <4.2.0.58.19990916080557.0509cb70@199.93.198.2>
To: <ftang@netscape.com>
Cc: www-international@w3.org
Thank you for the clarification.  It sounds like we may need to do 
something via Javascript and perhaps a control to make the validation 
process happen.  Thank you Frank.

--G--

At 02:42 PM 9/15/99 -0400, ftang@netscape.com wrote:

>--------------CD1AB6608350DF6C7ADD0107
>Content-Type: text/plain; charset=us-ascii
>Content-Transfer-Encoding: 7bit
>
>The name "accept-charset" itself is very misleading. The origion of the name
>"Accept-Charset" is from HTTP 1.1 protocol. The Accept-Charset in the HTTP is
>send out by client to server to indidcate which charset the client could
>handle.
>
>Somehow the "Accept-Charset" get put into HTML 4.0 with funny statement-
>
> > accept-charset = charset list [CI]
> >      This attribute specifies the list of character encodings for input 
> data that must be accepted by the server processing this form. The
> >      value is a space- and/or comma-delimited list of charset values. 
> The server must interpret this list as an exclusive-or list, i.e., the
> >      server must be able to accept any single character encoding per 
> entity received.
> >
> >      The default value for this attribute is the reserved string 
> "UNKNOWN". User agents may interpret this value as the character
> >      encoding that was used to transmit the document containing this 
> FORM element.
> >
>The reason I say it is a "funny statment" is because while the client can tell
>the server what charset it could accept, it is not reasonable for a form 
>(which
>may site in site A, B, or C) to tell the client what charset the CGI 
>(which may
>located in site D) could accept. Also, it should be the Client software to
>interpreet the list here, how can the server interprete that list? Since there
>are no word mention about the client, client could ignore this field and still
>implement the spec.
>
>The "accept" really mean what the SERVER could accept here (in the case of 
>HTML
>form, not HTTP 1.1). Therefore, it does not mean the browser have to 
>reject the
>user's input since the client may accept that input while the server don't.
>
>George Spafford wrote:
>
> > With forms, there is the accept-charset and I am trying to understand its
> > functionality a bit more.  I have a situation where users will need to
> > enter data into a database that can *only* handle ISO-8859-1
> > characters.  In respect to accept-charset, if a Japanese user is viewing a
> > site in shift-jis and goes to enter data, what will the data entry and
> > submit behavior be if accpet-charset is set to iso-8859-1?  I must
> > apologize for asking this, but I'm in a crunch right now and don't have
> > time to do the simulation.  I'm hoping I can leverage someone else's
> > experience.
> >
> > Can the shift-jis user enter data at all?  What encoding will the browser
> > use assuming the rest of the page is in shift-jis and accept-charset is set
> > to iso-8859-1?  The underlying picture is that I want users of other
> > languages to still be able to enter data into this database via am HTML
> > form provided they can read/write English while the rest of the page is
> > still in their native tongue.  Ideally, the entry would not bomb when they
> > hit submit but instead either shift the browser to 8859-1 or alert the user
> > that we can't handle the input.
> >
> > I can't revise the database at this time and am trying to come up with some
> > workarounds.  Longer term, we will change the database structures and all
> > the collateral scripts.
> >
> > Any thoughts?
>
>In any way, I think this is a wrong thing to do with your particular problem.
>If you care about form validation. Use JavaScript OnChange handler to scan 
>your
>data, and prompt the user if any text you don't want to see is there. For
>example, you should prompt the user if s/he type in A-Za-z for a telephone
>field.
>
>
> >
> >
> > --G--
>
>--------------CD1AB6608350DF6C7ADD0107
>Content-Type: text/html; charset=us-ascii
>Content-Transfer-Encoding: 7bit
>
><!doctype html public "-//w3c//dtd html 4.0 transitional//en">
>The name "accept-charset" itself is very misleading. The origion of the 
>name "Accept-Charset" is from HTTP 1.1 protocol. The Accept-Charset in the 
>HTTP is send out by client to server to indidcate which charset the client 
>could handle.
>
>Somehow the "Accept-Charset" get put into HTML 4.0 with funny statement-
>>
>>accept-charset = charset list [CI]
>>      This attribute specifies the list of character encodings for input 
>> data that must be accepted by the server processing this form. The
>>      value is a space- and/or comma-delimited list of charset values. 
>> The server must interpret this list as an exclusive-or list, i.e., the
>>      server must be able to accept any single character encoding per 
>> entity received.
>>
>>      The default value for this attribute is the reserved string 
>> "UNKNOWN". User agents may interpret this value as the character
>>      encoding that was used to transmit the document containing this 
>> FORM element.
>The reason I say it is a "funny statment" is because while the client can 
>tell the server what charset it could accept, it is not reasonable for a 
>form (which may site in site A, B, or C) to tell the client what charset 
>the CGI (which may located in site D) could accept. Also, it should be the 
>Client software to interpreet the list here, how can the server interprete 
>that list? Since there are no word mention about the client, client could 
>ignore this field and still implement the spec.
>
>The "accept" really mean what the SERVER could accept here (in the case of 
>HTML form, not HTTP 1.1). Therefore, it does not mean the browser have to 
>reject the user's input since the client may accept that input while the 
>server don't.
>
>George Spafford wrote:
>>With forms, there is the accept-charset and I am trying to understand its
>>functionality a bit more.  I have a situation where users will need to
>>enter data into a database that can *only* handle ISO-8859-1
>>characters.  In respect to accept-charset, if a Japanese user is viewing a
>>site in shift-jis and goes to enter data, what will the data entry and
>>submit behavior be if accpet-charset is set to iso-8859-1?  I must
>>apologize for asking this, but I'm in a crunch right now and don't have
>>time to do the simulation.  I'm hoping I can leverage someone else's
>>experience.
>>
>>Can the shift-jis user enter data at all?  What encoding will the browser
>>use assuming the rest of the page is in shift-jis and accept-charset is set
>>to iso-8859-1?  The underlying picture is that I want users of other
>>languages to still be able to enter data into this database via am HTML
>>form provided they can read/write English while the rest of the page is
>>still in their native tongue.  Ideally, the entry would not bomb when they
>>hit submit but instead either shift the browser to 8859-1 or alert the user
>>that we can't handle the input.
>>
>>I can't revise the database at this time and am trying to come up with some
>>workarounds.  Longer term, we will change the database structures and all
>>the collateral scripts.
>>
>>Any thoughts?
>In any way, I think this is a wrong thing to do with your particular 
>problem. If you care about form validation. Use JavaScript OnChange 
>handler to scan your data, and prompt the user if any text you don't want 
>to see is there. For example, you should prompt the user if s/he type in 
>A-Za-z for a telephone field.
>
>>
>>
>>--G--
>
>--------------CD1AB6608350DF6C7ADD0107--


George Spafford
Director of Development
Lionbridge Technologies
950 Winter Street, Suite 2410
Waltham, MA  02451-1291

Telephone:  781-434-6111 (direct)
Operator:    781-895-9889 x6111	
Facsimile:   781-890-3122
eFAX:         847-574-0658
Received on Thursday, 16 September 1999 08:17:19 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:16:54 GMT