RE: question from Thierry Sourbier on 2002-04-11 (www-international@w3.org from April to June 2002)

From: Thierry Sourbier <ml@i18nGurus.com>
Date: Thu, 11 Apr 2002 16:32:48 +0200
To: <www-international@w3.org>
Message-ID: <AMEFKKHPAGEPOAOBAOBMCEDNCFAA.ml@i18nGurus.com>

Your characters are getting transformed into HTML numerical character
reference e.g. &#54466; the number corresponds to the decimal value of
Unicode code point of your character. There should be no need to substract
0x10000.

This behavior occurs because the character could not be represented with the
character set of the page (e.g. typing Korean in a Latin-1 form). To my
knowledge it is Internet Explorer specific and other browsers are likelly to
send back to the server question marks characters instead of NCR :( The best
solution if your form needs to support multilingual input is to use UTF-8
for the page, you'll have a more consistent behavior accross browsers.

Hope this helps,
Thierry.

+---------------------------------------->
www.i18nGurus.com - The Open Internationalization Resources Directory.



-----Message d'origine-----
De : www-international-request@w3.org
[mailto:www-international-request@w3.org]De la part de
bruce.wallman@us.pwcglobal.com
Envoye : jeudi 11 avril 2002 15:01
A : www-international@w3.org
Objet : question




Hello. I need help in reading Korean (and Japanese) characters
arriving at the server via HTTP. The data is in response to text input
fields on an HTML form. I am receiving some characters that in the
HTTP input stream show as things like %2354466;

I have found that the %23 is a # sign and that subtracting 65536 from
the remaining 5 character number and then taking the ChrW of the
result gives me the right ideograph for the many that I have tested.
Is this all there is to it? Are there limits to the algorithm such that the
subtract 65536 algorithm only works for a certain range of these
characters and some other calculation is needed for others?
Is the same process that is working on a Korean machine likely
to work for a Japanese one. I am supplying the browser with
character set Meta tags and go through the same routines when
receiving DBCS languages.

Thanks ahead.

Regards

----------------------------------------------------------------
The information transmitted is intended only for the person or entity to
which it is addressed and may contain confidential and/or privileged
material.  Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipient is prohibited.   If you received
this in error, please contact the sender and delete the material from any
computer.

Received on Thursday, 11 April 2002 10:34:02 UTC