W3C home > Mailing lists > Public > www-international@w3.org > October to December 2015

Re: [presentation-api] Possibility for a character to be interpreted differently depending on locale

From: Martin Dürst via GitHub <sysbot+gh@w3.org>
Date: Thu, 12 Nov 2015 06:11:30 +0000
To: www-international@w3.org
Message-ID: <issue_comment.created-156012188-1447308689-sysbot+gh@w3.org>
I concur with @aphillips, but have some additional comments.

First, it's not really about differences between Simplified and 
Traditional Chinese, because these are distinguished by using 
different code point when the difference is significant (e.g. 区 vs. 

The differences in Han character (kanji) shapes between mainland 
China, Taiwan/HK, Japan, and so on are much finer, much closer to the 
level of font differences. Users in each country/region are used to 
read these characters in their preferred shapes, but basic readability
 isn't affected when another shape is used. As an example, I 
occasionally get email that displays Japanese text with a Chinese 
font; although this is somewhat suboptimal, it's nevertheless readable
 without problems.

Depending on the person reading something, I can imagine different 
preferences by different people for cross-language content. I don't 
read Chinese, and so cannot speak from direct experience, but I could 
well imagine that a Japanese native person might want to see even 
Chinese content displayed using a Japanese font, and vice versa, 
because such a person is more familiar with the glyph shapes in their 
native fonts. The main problem would be cases where e.g. some Chinese 
characters are not available in the Japanese font; this might lead to 
a ransom note effect which of course would be undesirable.

As for the different ways of showing U+005C, this is a hopeless 
remainder from a time where there were a lot of local code pages. I 
remember a time when on a German system, the {} frequently used in C 
were displayed as ö or some such; the rest was done by the viewer. 
Unfortunately, the yen/won sign effect won't disappear soon because it
 would require a Y2K-like effort without a deadline.

I strongly suggest to recommend the following whenever appropriate: 
"To denote Japanese Yen, replace U+005C with U+00A5 (¥, Yen symbol in 
Latin-1), U+FFE5 (¥, full width Yen sign), 円 (Kanji for Yen),..., and 
to denote Korean Won, replace U+005C with U+FFE6 (₩, full width Won 
sign),... Only use U+005C for syntactic backslash (e.g. in programming
 languages). (readers of programming languages at least in Japan are 
used to see a Yen symbol were other readers would expect a backslash 
in a program)" Unfortunately, this is the best advice there's at the 
moment. Relying on language tagging for this issue is not appropriate 
and not safe enough.

GitHub Notif of comment by duerst
Received on Thursday, 12 November 2015 06:11:32 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 22:41:09 UTC