- From: Terry Teague <terry_teague@users.sourceforge.net>
- Date: Fri, 14 Sep 2001 23:58:45 -0700
- To: html-tidy@w3.org
At 12:42 PM -0700 9/14/01, Rick Cameron wrote: >I've just submitted this problem at the SourceForge site. >I believe the code in tidy.c (revision 1.35) that reads Shift-JIS characters >has a problem. At line 1040 there appears to be the assumption that any byte >value greater than 127 is a lead byte (i.e. the first byte of a two-byte >character). I believe this is true of Big5 - but it is certainly not true of >Shift-JIS. In the diagram at >http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnintl/html >/S24CF.asp?frame=true you can see that the values from 0xa1 through 0xdf >represent singe-byte characters. Thanks for the feedback - the Big5 and Shift_JIS encoding support is preliminary, is based on earlier code by Rick Jelliffe, and is by default turned off in the current source (I'm not sure if Charlie turned it on for the beta builds he release recently). I am currently looking at another bug I discovered in the process of testing the Big5/Shift_JIS code changes. I'm not the expert in this area, and I welcome your continued feedback. Regards, Terry
Received on Saturday, 15 September 2001 02:59:43 UTC