Fwd: [UTR50] Comments on Unicode PRI207 - Unicode Properties for Vertical Text Layout from fantasai on 2011-10-24 (www-style@w3.org from October 2011)

From: fantasai <fantasai.lists@inkedblade.net>
Date: Sun, 23 Oct 2011 22:01:38 -0700
To: "www-style@w3.org" <www-style@w3.org>, 'WWW International' <www-international@w3.org>
Message-ID: <4EA4F132.8090608@inkedblade.net>
This is a copy of the comments sent to Unicode on behalf of the co-editors
of CSS3 Writing Modes and CSS3 Text (Koji Ishii, Shinyu Murakami, and myself).


=== Deadlines ===

We believe the deadline for comment is too short for such a complex spec.
In particular, the new classes will take time to review codepoint-by-codepoint.
We hope therefore that Unicode plans to update the spec through multiple
review cycles until it stabilizes before publishing UTR50 as a completed spec.

We have not yet reviewed the spacing classes; nor do we have a complete
codepoint-by-codepoint analysis of the orientation classes yet prepared.
Thus we are only sending general feedback in time for the PRI closing.

=== Scope ===

UTR #50 scopes itself to Japanese layout. However, CSS needs to address all
vertical writing systems. If the scope is not broadened to include other
writing systems, we cannot rely on UTR#50.

(We define vertical writing systems as those in which entire compositions,
not just small snippets such as image captions or table headers, are commonly
written vertically.)

=== Tailoring ===

UTR #50 makes no mention of tailoring the orientations. We think the orientation
classes should be tailorable; probably Unicode agrees, but this should be more
clearly explained.

So that we don't have to manage codepoint-by-codepoint character classes, we'd
eventually like UTR#50 to include classes that are commonly tailored / not
tailored, that we can reference. Some examples:

   * class for characters that are generally not tailored, i.e. vertical-native
     scripts such as Han, Hangul, Phags-Pa etc.
   * class for characters that belong to Western writing systems (typically set
     sideways) but are often set upright as symbols, i.e. Latin, Greek, and
     Cyrillic
   * brackets, which are pretty much never tailored to upright

=== Grapheme Clusters ===

UTR #50 does not provide any rules or pointers to rules about grapheme
clusterization. We suggest referencing UAX29 and giving examples of where the
boundaries there might adjusted (e.g. in Indian scripts).

The orientation properties need to be defined per grapheme cluster, not per
codepoint. We suggest that the properties come from the first base character,
except in the following cases:

     * Grapheme clusters formed with a combining mark of class Me should be
       treated as So in the Common script.
     * Grapheme clusters formed with a base of Zs should belong to category
       Sk and take their EAW from the space.

See also http://www.w3.org/TR/css3-writing-modes/#character-properties

=== OpenType Features ===

To force consistency in orientation, UTR#50 expects vert to apply only to T
(and maybe SB) category glyphs. However, this is incompatible with many fonts
and cannot be implemented by a system that expects to correctly handle legacy
content (in other words, any content authored with currently-existing fonts).

We would need to apply vert to the U category as well in order to handle:
   * proportional and non-square (compressed) fonts, e.g. AXIS fonts [1]
   * cursive fonts

We would need to apply vert to the SB category to handle
   * Glyph differences between vertical and horizontal writing in calligraphic
     / handwriting fonts, e.g. [2] [3]

A new font feature would be needed to apply to the S category to handle
   * slanted fonts, e.g susha.png
   * potential alignment differences for punctuation
   * anything else the font designer would like to vary between horizontal
     and vertical writing modes (e.g. brush stroke shapes/angles)

[1] http://www.axisfont.com/
[2] http://wiki.csswg.org/_media/spec/kodomonoji_20111005-en.png
[3] http://wiki.csswg.org/_media/spec/suzuedo.png

=== Miscategorized Scripts ===

The following scripts should be upright:

     * Hangul
     * Egyptian Hieroglyphic and its derivatives

Yi needs more investigation from someone who knows the language. Older books
are written vertically, and seem to be a rotation from the Unicode code charts.
However I've seen vertical captions in horizontally-set books printed upright.

Some examples of Yi:
   http://fantasai.inkedblade.net/style/scans/LoC008.png
   http://fantasai.inkedblade.net/style/scans/LoC011.png

Typeset Yi with vertical captions:
   http://fantasai.inkedblade.net/style/scans/LoC068.

=== Arrows and Box-Drawing ===

We suggest that arrows and box-drawing characters be set sideways by default,
as unlike other symbols, they are usually typeset in spatial relation to other
content rather than as a standalone graphic.

Box drawing characters are any characters in the U+2500–U+259F range.

Arrows are So characters in the U+2190–U+21FF, U+261A–U+261F, U+2794–U+27BE,
U+2B00–U+2B11, and U+2B45–U+2B46 ranges; and Sm characters in the U+27F0–297F
and U+2B30–U+2B4C ranges.

Placing arrows into the S category instead of U also relieves concerns about
inconsistent arrow orientations due to the application of 'vert' to U.

=== Superscripts, Subscripts, Bracket Pieces ===

We concur with the comments that suggest changing superscripts, subscripts,
and bracket pieces to S by default.

   * superscripts and subscripts
       http://www.unicode.org/forum/viewtopic.php?f=35&t=204
   * bracket pieces
       http://www.unicode.org/forum/viewtopic.php?f=35&t=206

=== Math ===

Because of the following reasons:

   * digits are typeset sideways by default
   * commonly used variable names (Latin, Greek) are typeset sideways by default
   * superscripts and subscripts are typically typeset sideways
   * arrows, which function as relations in math, would also be typeset
     sideways by default (see above)
   * ASCII math symbols are expected to typeset sideways
   * mathematical formulae are usually typeset sideways even in vertical text
   * the most commonly-used symbols that are intermixed with prose (× and +)
     are symmetric wrt rotation, and the equals sign (=) seems to be typeset
     sideways even when everything else is upright [4]

we suggest math symbols should be typeset sideways by default.

When intermixed in prose, variable names are often typeset upright, and in
such styles math symbols might also be typeset upright. However in these
situations some tailoring is necessary for the variable names whatever the
mathematical default, so using this style to determine the default rules in
plaintext does not make sense.

The default orientation of fullwidth math symbols is less clear, since
fullwidth characters typically provide an orientation contrast with their
ASCII counterparts; perhaps they should be U (or T for equals).

[5] http://fantasai.inkedblade.net/style/scans/ChinatownSFPL028.png
Received on Monday, 24 October 2011 05:02:11 UTC