- From: <Yoshito_Umaoka@lotus.co.jp>
- Date: Fri, 12 Apr 2002 10:26:46 -0400
- To: www-international@w3.org
Hi Bruce, >Will what I am doing work generally for all complex DBCS ideographs? Is it >in any way 'Korean' dependent? Are there other complex DBCS patterns that I >have not seen that require a different algorithm (for example, will I see >some numbers that are 4 or 6 digits rather than 5 or will I see some >numbers for which I should not subtract 65536, etc.)? I guess the reason why subtracting 65536 works well on Korean is that the VB function expects that the input Wide char code point is signed short. If my guess is correct, the logic might work well for characters bigger than 32767 in Unicode. Of course, NCR could be represented by 4 or 6 digits, not only 5 digits, although most of Hangul and Asian ideograph characters are in 5 digits range. In addition to this, many characters newly defined out of Unicode BMP. For example, characters in added in the new Chinese standard called GB18030 have code points beyond 65536. These "beyond BMP" characters could be represented by NCR with 7/8 digits numbers, although the current MS IE might not generate such NCRs on submitting HTML form data set. (I think MS IE generates a pair of illegal NCRs based on Unicode high/low surrogate in this case.) -Yoshito Umaoka
Received on Friday, 12 April 2002 10:27:26 UTC