- From: Henri Sivonen <notifications@github.com>
- Date: Thu, 11 Apr 2019 15:28:42 +0000 (UTC)
- To: whatwg/encoding <encoding@noreply.github.com>
- Cc: Subscribed <subscribed@noreply.github.com>
Received on Thursday, 11 April 2019 15:29:07 UTC
> closer to a 30%, 60% and 10% distribution Japanese Wikipedia has 46% kanji, 28% hiragana, 27% katakana, and way less than 1% half-width katakana. However, article titles in Japanese Wikipedia have 42% kanji, 5% hiragana, 53% katakana, and almost no half-width katakana. This suggests that it's a bad idea to expect general hiragana to katakana ratio if a detector only checks the first 1024 bytes of an HTML document and can expect to see the page title. In general, looking at what happens to misinterpreted kana between Shift_JIS and EUC-JP, kana ratio seems like a moot issue, but half-width katakana showing up is a very good indicator of having the wrong encoding. -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/whatwg/encoding/issues/157#issuecomment-482162553
Received on Thursday, 11 April 2019 15:29:07 UTC