W3C home > Mailing lists > Public > www-validator-css@w3.org > June 2001

Corruption in page http://jigsaw.w3.org/css-validator/validator.html.ja; similar issue in Tidy

From: Ben Jones <BJTranslations@bigfoot.com>
Date: Sat, 16 Jun 2001 19:20:33 +0100
Message-ID: <004f01c0f697$4422cd80$5695883e@benjones>
To: <www-validator-css@w3.org>
Cc: <html-tidy@w3.org>
Some of the Japanese characters in page
have been corrupted, as the JIS codes contain '<' symbols that were
erroneously converted to '&lt;'. I'm not sure how, when or why this
happened, but I thought you'd like to know anyway.

I can't seem to get Outlook to include both corrupt and corrected text here,
so I've attached a text and GIF version.

N.B. I noted similar errors with Dave Raggett's HTML-Tidy when used on
Shift-JIS pages via Tidy's ISO-2022 setting. Characters including hi-ASCII
0A0h, which is I believe a non-breakable space in Latin-1, have their second
half converted to &nbsp; and hence corrupted. Of course this is intended
for JIS pages rather than Shift-JIS -- but I thought it might just work, as
you almost never see any 7-bit JIS except in e-mail. In my experience,
almost all Japanese web pages are either Shift-JIS or EUC, with a few
Unicode sites starting to creep in. FWIW Tidy seems to works OK on Shift-JIS
via its Raw setting.

HTH (no reply required)

Ben Jones

Japanese translation / interpreting / typesetting
Tel/fax: +44 1843 847701
E-mail: BJTranslations@bigfoot.com
Web: http://www.japanesetranslations.co.uk

(image/gif attachment: badtext.gif)

Received on Saturday, 16 June 2001 15:04:00 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 23:00:32 UTC