Measuring "ideographic character face" and commonest characters from fantasai on 2016-09-24 (public-i18n-cjk@w3.org from July to September 2016)

From: fantasai <fantasai.lists@inkedblade.net>
Date: Sat, 24 Sep 2016 16:20:35 +0100
To: CJK discussion <public-i18n-cjk@w3.org>
Message-ID: <92b5d8fc-d290-e664-5555-4c4d104be073@inkedblade.net>

The CSS Inline spec has an appendix for synthesizing missing baseline information
from glyph outlines of specific characters. From a comparison of several fonts I
have on my system, it seems that the top and bottom bounds are best found using
丅 and 丄, and the side bounds are best found using the 囗 series of characters.
   http://drafts.csswg.org/css-inline/#baseline-synthesis-fonts

The specific question I have is, which of this 囗 series is the most likely
to be found in all CJK fonts and be a good representative for the widest advance?
My guess is it's one of:
   囗回因囯国困

The full series out of the CJK Unified Ideographs block in Unicode is
囗囙囚四囜囝回囟因囡团団囤囥囦囧囨囩囪囫囬园囮囯困囱囲図围囵囶囷囸囹固囻囼国图囿圀圁圂圃圄圅圆圇圈圉圊國圌圍圎圏圐圑園圓圔圖圗團圙圚圛圜圝圞
Although it's undoubtedly the most common, I'm a little hesitant to use 四 (four)
since I'm not sure if it's always drawn at the same width as 因 (reason) et al.

~fantasai

Received on Saturday, 24 September 2016 22:30:04 UTC