W3C home > Mailing lists > Public > public-i18n-cjk@w3.org > July to September 2016

Re: Measuring "ideographic character face" and commonest characters

From: Martin J. Dürst <duerst@it.aoyama.ac.jp>
Date: Sun, 25 Sep 2016 11:20:55 +0900
To: fantasai <fantasai.lists@inkedblade.net>, CJK discussion <public-i18n-cjk@w3.org>
Message-ID: <6c3c246d-2112-353a-6936-632f093534bb@it.aoyama.ac.jp>
Hello Fantasai,

What I have done is download ISO-10646 (publicly available) and looked 
at the columns/sources. Essentially, you want a character that shows up 
in all columns, with the most basic source (G0, HB1, T1, J0, K0, V1).
(G=China, H=Hong Kong, T=Taiwan, J=Japan, K=Korea, V=Viet Nam)

On 2016/09/25 00:20, fantasai wrote:
> The CSS Inline spec has an appendix for synthesizing missing baseline
> information
> from glyph outlines of specific characters. From a comparison of several
> fonts I
> have on my system, it seems that the top and bottom bounds are best
> found using
> 丅 and 丄,

Please be careful. 丄 doesn't appear in K or V sources, and 丅 also not in 
H sources.

> and the side bounds are best found using the 囗 series of
> characters.
>   http://drafts.csswg.org/css-inline/#baseline-synthesis-fonts
> The specific question I have is, which of this 囗 series is the most likely
> to be found in all CJK fonts and be a good representative for the widest
> advance?

> My guess is it's one of:
>   囗

I'd exclude this, because it may be somewhat smaller because it doesn't 
contain anything.

> 回因

These seem okay.

> 囯

Not in J/K/V

> 国

Not in V.

> 困

Seems okay.

> The full series out of the CJK Unified Ideographs block in Unicode is
> 囗

See above.

The following are not in all columns:

> 囙囜囝囡团団囤囥囦囧囨囩囫囬囮囯困囲図围囵囶囸囹囻
> 囼国图囿圀圁圂圄圆圇圉圊圌圎圏圐圑圔圗圙圛圜圝圞

These seem okay:
> 囚回因固圃圈國圍園圓圖團

I'd probably go with some of the later ones rather than 回 or 因, because 
as a general tendency, the more content, the higher the chance that they 
are a bit wider. But please check for yourself, too.

In all columns, but not in the most basic set:
> 园囷圚

See below:
> 四

Probably not suited, because they may be drawn slightly differently:
> 囟囪囱圅

> Although it's undoubtedly the most common, I'm a little hesitant to use
> 四 (four)
> since I'm not sure if it's always drawn at the same width as 因 (reason)
> et al.

Good point. In some more cursive fonts, it will be smaller at the bottom 
than at the top.

Regards,   Martin.

> ~fantasai
> .

Martin J. Dürst
Department of Intelligent Information Technology
Collegue of Science and Engineering
Aoyama Gakuin University
Fuchinobe 5-1-10, Chuo-ku, Sagamihara
252-5258 Japan
Received on Sunday, 25 September 2016 02:21:36 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 26 October 2016 23:39:18 UTC