- From: asllearner <old.nabble.99.kyoto@spamgourmet.com>
- Date: Mon, 15 Mar 2010 05:55:18 -0700 (PDT)
- To: html-tidy@w3.org
This is a little complicated but I will try to explain as clearly as I can: THe basic problem is that when I run tidy on files with japanese utf8 characters, the output looks like garbage and I get an error invalid utf character. I am not sure what settings I should use to make the japanese characters be correct... Here are the details: I am using a japanese computer with windows xp. I have been using editplus3 and editpadpro and as my editor and tidy-gui, though I also have the smame problem when i use html-kit tools. When I type japanese characters in my html file, such as this ひらがな (hiragana) or 漢字 (kanji), I can read the characters fine in my editor, and they appear to be in utf8 encoding (if I open as utf8 I can read the text...). When I tell tidy the charset is utf8, or the input and output encoding are utf8, I get the error.he output is garbage, as far as I can tell... Here is a specific example input: ひらがな ( u+hex values: U+3072 U+3089 U+304C U+306A) (hiragana) output:a?2a??a??a?a warning: replacing invalid utf-8 char code U+0081 note that the utf character code in the warning is not in the actual string that was passed! I have tried various combinations of input and output encoding, including raw, and with other parameters set to defaults...if I need to be more explicit here i will... any help troubleshooting greatly appreciated... thanks -- View this message in context: http://old.nabble.com/unicode-characters-not-showing-in-output-tp27874747p27874747.html Sent from the w3.org - html-tidy mailing list archive at Nabble.com.
Received on Tuesday, 16 March 2010 15:43:07 UTC