W3C home > Mailing lists > Public > www-international@w3.org > April to June 2002

Re: HTTP arriving at the server

From: <bruce.wallman@us.pwcglobal.com>
Date: Fri, 12 Apr 2002 10:42:16 -0400
To: www-international-request@w3.org
Cc: www-international@w3.org
Message-ID: <OF31BC965F.E15259F0-ON85256B99.004FE290@nam.pwcinternal.com>

This is not a futures question, so I am only concerned with character sets
currently represented by IE. Given that constraint, are there Unicode
characters 32767 and below that will arrive at the server as #12345; and
need a different translation? Obviously, anything 9999 and below would be
here. Are there really any that I will see some in the #199999; range?

Does anyone know enough about Korean to tell me what combination of
characters to hit on the Korean keyboard to test values below 32768 or
above 99999? Does anyone know of a table somewhere (it would be big) that
shows the translation of HTTP number values to ideographs?



Hi Bruce,

>Will what I am doing work generally for all complex DBCS ideographs? Is it
>in any way 'Korean' dependent? Are there other complex DBCS patterns that
>have not seen that require a different algorithm (for example, will I see
>some numbers that are 4 or 6 digits rather than 5 or will I see some
>numbers for which I should not subtract 65536, etc.)?

I guess the reason why subtracting 65536 works well on Korean is that
the VB function expects that the input Wide char code point is signed
short.  If my guess is correct, the logic might work well for characters
bigger than 32767 in Unicode.

Of course, NCR could be represented by 4 or 6 digits, not only 5 digits,
although most of Hangul and Asian ideograph characters are in 5 digits
range.  In addition to this, many characters newly defined out of Unicode
BMP. For example, characters in added in the new Chinese standard called
GB18030 have code points beyond 65536.  These "beyond BMP" characters
could be represented by NCR with 7/8 digits numbers, although the current
MS IE might not generate such NCRs on submitting HTML form data set.
(I think MS IE generates a pair of illegal NCRs based on Unicode high/low
surrogate in this case.)

-Yoshito Umaoka

The information transmitted is intended only for the person or entity to
which it is addressed and may contain confidential and/or privileged
material.  Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipient is prohibited.   If you received
this in error, please contact the sender and delete the material from any
Received on Friday, 12 April 2002 10:41:57 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 21 September 2016 22:37:21 UTC