Re: Simplified or traditional for each Chinese macrolanguage from gfb hjjhjh on 2016-08-20 (public-i18n-cjk@w3.org from July to September 2016)

From: gfb hjjhjh <c933103@gmail.com>
Date: Sat, 20 Aug 2016 22:11:21 +0800
To: Koji Ishii <kojiishi@gmail.com>
Cc: r12a <ishida@w3.org>, Xidorn Quan <me@upsuper.org>, CJK discussion <public-i18n-cjk@w3.org>, 董福興 <bobbytung@wanderer.tw>, 劉慶 <ryukeikun@gmail.com>, Makoto Kato <m_kato@ga2.so-net.ne.jp>, John Cowan <cowan@mercury.ccil.org>
Message-ID: <CAGHjPP+h-mA=A1mAXmTWup8A3rzgBgFV2+rtGU1Ko9PzJCGyGQ@mail.gmail.com>

1. Most people as I see on Cantonese discussion spaces in China use
simplified Chinese to write Cantonese despite exception exists
2. Choosing which font to use is a matter of region more than script, a.)
please don't think -HK mean traditional chinese with extra characters from
HKSCS, there are in fact different standard in HK and TW on how same
characters are written which from what I know fonts like Noto will soon
include HK version vs TW version, and b.) all those 'extra' HKSCS
characters are now part of Unicode and thus a Taiwan font may include those
characters too, similarly c.) a traditional chinese font can include
simplified chinese characters and vice versa, thus the distinction of font
choice for different languages fall under Chinese macrolanguage is more
related to region than script
3. When a zh-CN user writing in Hant script, they might still expect to
see/desire to use fonts designed with mainland China glyph standard in mind
and those fonts are sometimes referred to as simplified chinese font.
4. It could be problematic to default some Chinese variants to any version
of Chinese font without enough research. For instance, Ming Deng Wikipedia
(cdo) is mostly written in Latin script with some rare latin characters
that most Chinese fonts won't include. I am not familiar with the actual
usage of the language but if most cdo users really write the language in
Latin then it could create huge inconvenience for users of the language.
5. Many Chinese variants are used in multiple regions and picking either
traditional or simplified (or HK/TW/CN) font as default could anger others.
6. When using macrolanguage without specifying any extlang tag, the actual
language being referred to might differ from region to region. For instance
zh-CN and zh-TW are rather clearly cmn, but for zh-HK, it is mostly mixture
of cmn and yue, and thus it is neither cmn nor yue.

2016年7月28日 下午3:55 於 "Koji Ishii" <kojiishi@gmail.com> 寫道：

> FYI, Blink and WebKit has a simple parser for BCP-47. I'm currently
> refactoring it a bit, the document here[1] if of any interests.
>
> Blink converts BCP-47 to ICU script code and use it for the font
> selection. To compute the script from BCP-47, I think the collect priority
> would be "script > lang > region", so "zh-yue-CN" should be "Hant".
>
> [1] https://codereview.chromium.org/2190833003
>
> /koji
>
> 2016-07-28 9:23 GMT+09:00 Xidorn Quan <me@upsuper.org>:
>
>> On Thu, Jul 28, 2016, at 02:00 AM, ishida@w3.org wrote:
>> > On 27/07/2016 08:18, Xidorn Quan wrote:
>> > > Richard: could you review this list as well
>> >
>> > hi Xidorn, here are some thoughts from a quick review.
>> >
>> > i think you're missing zh on its own.
>>
>> zh is something pre-exists and browsers already agree with each other,
>> so I ignore it. But yeah, if we put the list in some document, we should
>> include this as well.
>>
>> > Also, if this is a list of what people/applications may use to describe
>> > the language of a page, you should probably add zh-CN, zh-TW and zh-HK
>> > to your list (but not to any list in clreq).
>>
>> They would just be mapped to themselves I suppose.
>>
>> > I suppose there's also a way to match something like zh-yue-HK to
>> zh-yue?
>>
>> I think zh-yue-HK would fallback to zh-yue automatically given the list,
>> shouldn't it?
>>
>> But if you write zh-yue-CN, hmmm...
>>
>> > I guess the use of =zh-CN rather than =zh-Hans is a legacy issue with
>> > the way things are labelled in the code?  It would be nice to use the
>> > script codes rather than region codes, if possible, since that properly
>> > expresses what is meant.
>>
>> Probably... Internally we actually map zh-Hans to zh-CN :)
>>
>> - Xidorn
>>
>
>

Received on Saturday, 20 August 2016 14:11:49 UTC