Re: HTTP arriving at the server from Yoshito_Umaoka@lotus.co.jp on 2002-04-12 (www-international@w3.org from April to June 2002)

From: <Yoshito_Umaoka@lotus.co.jp>
Date: Fri, 12 Apr 2002 10:26:46 -0400
To: www-international@w3.org
Message-ID: <OFB3B9AA2B.02791FED-ON85256B99.004D1DCB-85256B99.004F5CFC@lotus.com>

Hi Bruce,

>Will what I am doing work generally for all complex DBCS ideographs? Is it
>in any way 'Korean' dependent? Are there other complex DBCS patterns that
I
>have not seen that require a different algorithm (for example, will I see
>some numbers that are 4 or 6 digits rather than 5 or will I see some
>numbers for which I should not subtract 65536, etc.)?

I guess the reason why subtracting 65536 works well on Korean is that
the VB function expects that the input Wide char code point is signed
short.  If my guess is correct, the logic might work well for characters
bigger than 32767 in Unicode.

Of course, NCR could be represented by 4 or 6 digits, not only 5 digits,
although most of Hangul and Asian ideograph characters are in 5 digits
range.  In addition to this, many characters newly defined out of Unicode
BMP. For example, characters in added in the new Chinese standard called
GB18030 have code points beyond 65536.  These "beyond BMP" characters
could be represented by NCR with 7/8 digits numbers, although the current
MS IE might not generate such NCRs on submitting HTML form data set.
(I think MS IE generates a pair of illegal NCRs based on Unicode high/low
surrogate in this case.)

-Yoshito Umaoka

Received on Friday, 12 April 2002 10:27:26 UTC