[css3-writing-modes] real vs. synthetic width glyphs

The CSS3 Writing Modes spec defines the 'text-combine-horizontal'
property to implement short, one-glyph wide inline horizontal runs of
two or three characters, referred to as 縦中横 (tatechuyoko) in Japanese.
The common use cases for this are within dates (e.g. 12月15日) or for
small numbers (e.g. 100).

http://dev.w3.org/csswg/css-writing-modes/#text-combine-horizontal

For numerals, the common use case, authors can use either ASCII digits
(1,2,3...) or full-width digits (1、2、3...).  The default orientation
of the former is rotated, and upright for the latter, so I think a
common usage pattern will be the use of full-width digits.

For both codepoint sets, fonts may provide width-specific variants for
use with tatechuyoko which are selected by enabling the OpenType
features associated with these.  In fonts used commonly for
publishing, half-width and third-width digit variants are common,
quarter-width less so.  The combination of codepoint choice and
feature choice leads to a somewhat confusing array of choices for
rendering "simple" digits:

- default ASCII digits
- default full-width digits
- half-width variations of digits
- third-width variations of digits

Note that the default ASCII digits and the half-width variations are
*not* the same!!  Some fonts will have half-width and third-width
variations, others won't.  To implement tatechuyoko using the existing
spec, implementations can either (1) use the width variations and
scale or (2) just scale the default glyphs.

Here are some examples that show the difference between using actual
half-width and third-width variations vs. synthetic scaled versions:

Testcase:
http://people.mozilla.org/~jdaggett/tests/width-testing.html

Hiragino Mincho 
(Apple/DaiNippon, default Japanese serif on OSX/iOS)
http://lists.w3.org/Archives/Public/www-archive/2013Jul/att-0000/widths-hiragino-mincho.png

Kozuka Mincho
(Adobe)
http://lists.w3.org/Archives/Public/www-archive/2013Jul/att-0000/widths-kozuka-mincho.png

MS Mincho and MS PMincho
(Microsoft, default Japanese serif on Windows)
http://lists.w3.org/Archives/Public/www-archive/2013Jul/att-0000/widths-msmincho.png
http://lists.w3.org/Archives/Public/www-archive/2013Jul/att-0000/widths-mspmincho.png

In all of these cases, the "full-width codepoints, scaled to third"
looks particularly bad, at normal text sizes on Windows this will be
basically unreadable, particularly with DirectWrite rendering:

IE10, MS Mincho third-width renderings:
http://lists.w3.org/Archives/Public/www-archive/2013Jul/att-0000/widths-ie10-msmincho-twid.png

I think if the spec is going to require "compression" of a tatechuyoko
run to fit the run into 1em, then it should be defined clearly and not
left undefined.  Specifically, I don't think synthesizing tatechuyoko
from full-width glyphs should be allowed.  Allowing naive user agents
to render this way will force authors to target the "least common
denominator" rendering by avoiding full-width digits and I don't think
that's a great idea.

I would propose that the process of laying out tatechuyoko runs be:

1. Convert full-width codepoints to their default equivalents (i.e.
   full-width digits would switch to their ASCII digit equivalents)
2. Based on the length, apply the appropriate OpenType feature
   (i.e. half/third/quarter width)
3. Scale the result to 1em if necessary
4. Treat the resulting composition of glyphs as a single glyph that
   matches the metrics of typical ideographic glyphs for the font used
   (i.e. does *not* affect the size of the inline box).  The resulting
   composition of glyphs is defined to have no available substitions
   (i.e. none of the font-variant/font-feature-settings affect the
   rendering of the composition).

Elika proposed something similar [1] but and Koji's response was "nah,
undefined is better" [2].  However, I think if scaling to 1em is a
requirement then how that occurs must to be defined explicitly. 
Leaving it undefined would force authors to work around naive
implementations that simply scale whatever the content is, even if
full-width codepoints are used.  I think the examples above make it
plain that's not a good idea.

One additional note, I think the possible set of values for use with
'digits' should be limited to 2, 3, 4.  Anything else is nonsensical,
theoretically possible but illegible in practice.

Regards,

John Daggett

Received on Monday, 1 July 2013 08:07:31 UTC