W3C home > Mailing lists > Public > html-tidy@w3.org > July to September 2001

RE: Problem processing Shift-JIS

From: Terry Teague <terry_teague@users.sourceforge.net>
Date: Fri, 14 Sep 2001 23:58:45 -0700
Message-Id: <l03130302b7c8abc89700@[17.219.108.11]>
To: html-tidy@w3.org
At 12:42 PM -0700 9/14/01, Rick Cameron wrote:
>I've just submitted this problem at the SourceForge site.

>I believe the code in tidy.c (revision 1.35) that reads Shift-JIS characters
>has a problem. At line 1040 there appears to be the assumption that any byte
>value greater than 127 is a lead byte (i.e. the first byte of a two-byte
>character). I believe this is true of Big5 - but it is certainly not true of
>Shift-JIS. In the diagram at
>http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnintl/html
>/S24CF.asp?frame=true you can see that the values from 0xa1 through 0xdf
>represent singe-byte characters.

Thanks for the feedback - the Big5 and Shift_JIS encoding support is
preliminary, is based on earlier code by Rick Jelliffe, and is by default
turned off in the current source (I'm not sure if Charlie turned it on for
the beta builds he release recently). I am currently looking at another bug
I discovered in the process of testing the Big5/Shift_JIS code changes. I'm
not the expert in this area, and I welcome your continued feedback.

Regards, Terry
Received on Saturday, 15 September 2001 02:59:43 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 3 April 2012 06:13:46 GMT