- From: Ambrose LI <ambrose.li@gmail.com>
- Date: Wed, 27 Jul 2016 00:09:59 -0400
- To: Xidorn Quan <me@upsuper.org>
- Cc: Koji Ishii <kojiishi@gmail.com>, John Cowan <cowan@mercury.ccil.org>, 董福興 <bobbytung@wanderer.tw>, CJK discussion <public-i18n-cjk@w3.org>, Makoto Kato <m_kato@ga2.so-net.ne.jp>, 劉慶 <ryukeikun@gmail.com>
2016-07-26 23:34 GMT-04:00 Xidorn Quan <me@upsuper.org>: > On Wed, Jul 27, 2016, at 12:13 PM, Koji Ishii wrote: > >> So Literary Chinese and Mandarin are hard to determine? I checked Windows >> region/locale/language settings but it doesn't seem to have these in the >> list. > > Mandarin is just what we generally refer to by saying "Chinese". Literary is > a historic language of Chinese, which is not used in daily life nowadays. >> Maybe we should handle them as "unknown", so that browsers fallback to use >> the system setting? > > What should zh (without anthing else) do, actually? What happens to that > should probably be what we do for Mandarin and Literary. zh defaults to simplified, right? I assume zh-cmn isn't really that common? Maybe we need to tell people to just skip tagging text as Mandarin as use zh-cmn-hant and zh-cmn-hans instead. But come to think about it, I actually like the idea of treating Mandarin as unknown, but I think the semantics shouldn't be "fallback to use the system setting", but something more like "inherit if possible, else fall back to the system setting". What I have in mind is a use case to the effect of <html lang=zh-hant> [...] We call this <span lang=zh-yue>xxx</span>. However, in Mandarin the same thing is called <span lang=zh-cmn>yyy</span>. (Obviously, this would also apply to other dialects. For example, in a simplified Chinese text describing language differences, a term tagged zh-yue is most likely also in simplified Chinese, not traditional Chinese.) There are obviously other possibilities, such as an entire audio transcript tagged as "Mandarin". In this scenario there would be no script to inherit from and we'll probably have to guess. Thoughts? >> FYI, Wikipedia[1] already uses lzh", without script. > > If we use Wikipedia as the criterion, the list would significantly change. > Basically as far as I can see, Wikipedia uses Traditional Chinese in almost > every Chinese languages it has a version for. But I suspect that most of > those Wikipedia are built by language enthusiasts, and not used by people in > general, so I tend not to pick that as a criterion. I only know the zh, lzh, and yue versions. FWIW, IMHO traditional for lzh makes a lot of sense because mapping back from simplified to traditional is problematic. That's true for zh as well but I guess since so many people use simplified these days accepting simplified is unavoidable. For yue, I mentioned that zh-yue-hant and zh-yue-hans use different conventions. To put it another way, if you do a Unicode-based conversion from zh-yue-hant to zh-yue-hans you essentially get gibberish, and if you do it the other direction you also get gibberish. This is even worse than the usual conversion between zh-hans and zh-hant, so I assume they just had to pick one and stick with it. > But on the other hand, I guess those language tags are almost only used in > Wikipedia, and not anywhere else... I’ve seen lzh used in software. So it (as a valid ISO language code) certainly is being used elsewhere, though probably very rarely. -- Ambrose Li // http://o.gniw.ca / http://gniw.ca If you saw this on CE-L: You do not need my permission to quote me, only proper attribution. Always cite your sources, even if you have to anonymize and/or cite it as "personal communication".
Received on Wednesday, 27 July 2016 04:11:09 UTC