Corruption in page; similar issue in Tidy

Some of the Japanese characters in page
have been corrupted, as the JIS codes contain '<' symbols that were
erroneously converted to '&lt;'. I'm not sure how, when or why this
happened, but I thought you'd like to know anyway.

I can't seem to get Outlook to include both corrupt and corrected text here,
so I've attached a text and GIF version.

N.B. I noted similar errors with Dave Raggett's HTML-Tidy when used on
Shift-JIS pages via Tidy's ISO-2022 setting. Characters including hi-ASCII
0A0h, which is I believe a non-breakable space in Latin-1, have their second
half converted to &nbsp; and hence corrupted. Of course this is intended
for JIS pages rather than Shift-JIS -- but I thought it might just work, as
you almost never see any 7-bit JIS except in e-mail. In my experience,
almost all Japanese web pages are either Shift-JIS or EUC, with a few
Unicode sites starting to creep in. FWIW Tidy seems to works OK on Shift-JIS
via its Raw setting.

HTH (no reply required)

Ben Jones

Japanese translation / interpreting / typesetting
Tel/fax: +44 1843 847701

Received on Saturday, 16 June 2001 15:04:00 UTC