Re: CSS3 Text: Multi-Directional Scripts in Vertical Inline Progression

Etan Wexler wrote:
> fantasai wrote:
>>http://fantasai.tripod.com/www-style/2003/directions/vertical-bidi.html
>
> Congratulations on another clear, good-looking, and well-considered
> document. You are truly an asset to the CSS community.

Thank you for your generous compliments, Etan, but you really shouldn't
flatter me so. ;)

>>I propose that CSS3 Text provide for all three orientation
>>styles.
> 
> You have not proposed the mechanism. Do we press 'writing-mode' into service
> with new values? Do we add values to 'direction'? Do we coin a shorthand for
> 'direction' and the glyph orientation properties?

I don't think we can use writing-mode, as it's already committed to setting
progressions.

'direction' shouldn't be used because it's purpose is to set the directionality
of an element's text. This is actually different from the inline progression -
Consider the following character stream:

   [start] <element>STRANGE BUT POSSIBLE</element> [end]

If for some reason this text is supposed to run right-to-left, I need to apply
a bidi override. (I assume that's what it's for.) I can do so with CSS:

   element { direction: rtl;
             unicode-bidi: override; }

The result is

   ELBISSOP TUB EGNARTS               <<<< read this way (r-l)

Note that this ordering *is correct*. It's not a stylistic effect. It is
necessary for correct interpretation of the text.

Now, let us suppose that this text is to be laid out vertically, in a
right-to-left block progression. Being rtl text, it's natural orientation
is to run bottom to top with the glyph-orientation at 90deg.

   [top] ELBISSOP TUB EGNARTS [bottom]  <<< read this way (b-t)

In this case, the inline progression is also "right-to-left", if you take
the right edge to be the "top".

However, if I decide to use the "upright" style, with glyph-orientation 0deg,
the text must run

   S
   T
   R
   A
   N     read this way (t-b)
   G         |
   E         V

   B
   U
   T

   P
   O
   S
   S
   I
   B
   L
   E

Note that although the directionality is still rtl, the inline progression
is top-to-bottom, or "left-to-right".

If we assigned 'direction' to the role of controlling inline progression,
we'd have to set "direction: ltr" to have the text display "upright".
Assuming 'block-progression' remains 'rl', this is fine.

However, in most cases, 'block-progression' will not remain 'rl'. The most
common reasons would be lack of support and user prefs. The author sheets,
also, could reset 'block-progression' later in the cascade. (Such a situation
can be expected in complex style sheets.)

What happens if 'block-progresion' becomes 'tb'? We get

   STRANGE BUT POSSIBLE

which is *wrong*. It's backwards.

------------------------------------------------------------------------------

The Inline Progression

The inline progression isn't necessarily another independent variable for
CSS to control. Although it is _distinct_ from directionality, it can be
described as a function of the block progression, text directionality, and
glyph orientation.

In an absolute model (degrees always clockwise), a table listing the properties
of a same-direction text run and the resulting inline progression would look
like this:

  block progression    directionality    glyph orientation  |  inline progression
  -----------------    --------------    -----------------  |  ------------------
     horizontal        Left  to right           0deg        |    top-to-bottom
     horizontal        Right to left            0deg        |    top-to-bottom
     horizontal        Left  to right          90deg        |    top-to-bottom
     horizontal        Right to left           90deg        |    bottom-to-top
     horizontal        Left  to right         180deg        |    bottom-to-top *
     horizontal        Right to left          180deg        |    bottom-to-top *
     horizontal        Left  to right         270deg        |    bottom-to-top *
     horizontal        Right to left          270deg        |    top-to-bottom *

(The headers do not represent CSS properties.)

You can see that the inline progressions for 180 degrees and 270 degrees don't
match the current text for 'glyph-orientation' and 'direction'.

I will say right now that I've never seen glyphs rotated 180 degrees in running
text, but I'd expect it to read bottom to top as well as have its glyphs rotated
bottom to top.

As for 270 degrees, the net effect with the current definition of
'glyph-orientation' would be similar to a bidi override after glyph selection.
Text is set as for 90 degrees, but then each glyph is rotated 180 degrees more
in place -- it's upside down with respect to the inline progression. This is,
in effect, what happened to the Farsi text in
   http://fantasai.tripod.com/www-style/2003/directions/flow-diagram2.gif
and it's unreadable.

---------------------------------------------------------------------------------

Script Types:

Scripts can be classified according to their directionality, as they
are in Unicode. Unfortunately, Unicode only defines horizontal
directionality even though vertical and bi-orientational scripts have
vertical directionality as well, . For example, while English can go
either top or bottom or bottom to top (since it doesn't have a vertical
directionality), Japanese should only go from top to bottom, even in
an 'lr' block progression. Mongolian also has top-to-bottom vertical
directionality. Unlike Japanese however, it has no definite horiziontal
directionality--only a preferred left-to-right directionality as
assigned in Unicode.

Bi-orientational scripts may be further classified by how their glyphs
transform when switching orientations. CJK characters translate; they
are always upright. Other scripts, such as Ogham and some variants of
classical Yi, must be rotated.

So, to summarize, scripts possess the following properties:

Orientation:  horizontal       (e.g. Latin)
               vertical         (e.g. Mongolian)
               bi-orientational (e.g. Han)

Horizontal directionality: left-to-right (e.g. Devanagari)
                            right-to-left (e.g. Arabic)
                            none          (e.g. Mongolian)
                                  "None" applies to vertical scripts;
                                  Unicode does assign a preferred
                                  direction, though, which should be
                                  honored by default.

Vertical directionality: top-to-bottom (e.g. Katakana)
                          bottom-to-top (e.g. Ogham)
                          none          (e.g. Arabic)
                                "None" applies to all horizontal scripts.

Bi-orientational transform: rotational    (e.g. Ogham)
                             translational (e.g. Han)

[See Appendix A for a partial table of scripts]

Characters defined as "Wide" in Unicode are treated as bi-orientational
with translational transformation.

---------------------------------------------------------------------------------

Text-Orientation:

So that I have a clean slate to work with, I am now going to define a new
property, 'text-orientation-vertical', based on the styles described in
http://fantasai.tripod.com/www-style/2003/directions/vertical-bidi.html

An upright glyph (oriented 0 degrees) is defined to be the orientation it
appears in in the Unicode code charts. For horizontal and bi-orientational
scripts, this is the normal orientation in horizontal text. For vertical
scripts, this is the normal orientation in vertical text.

'text-orientation-vertical' takes the following values:

'0deg'
    All glyphs are oriented upright and each line of text is laid out from
    top to bottom. There is no BIDI reordering within the element.

      In an 'lr' block progression, all directional characters are ordered as
      right-to-left characters (R) in the BIDI algorithm.

      In an 'rl' block progression, all directional characters are ordered as
      left-to-right characters (L) in the BIDI algorithm.

      In both cases the glyph orientation is 0 degrees, and any available
      vertical glyph variants should be used. Enclosing punctuation should
      thus face inward. If the font does not have vertical variants of
      such punctuation, the user agent may rotate the horizontal glyph.

'180deg'
    All glyphs are oriented upside down and each line of text is laid out
    from bottom to top. There is no BIDI reordering within the element.

      In an 'lr' block progression, all directional characters are ordered as
      left-to-right characters (L) in the BIDI algorithm.

      In an 'rl' block progression, all directional characters are ordered as
      right-to-left characters (R) in the BIDI algorithm.

      In both cases the glyph orientation is 180 degrees, and any available
      vertical glyph variants should be used.  Enclosing punctuation should
      thus face inward. If the font does not have vertical variants of
      such punctuation, the user agent may rotate the horizontal glyph.

'90deg'
    All glyphs are oriented with their tops toward the before edge of the
    block. BIDI reordering takes place.

'270deg'
    All glyphs are oriented with their tops toward the after edge of the
    block. BIDI reordering takes place, but the directions are reversed;
    that is, left-to-right characters are treated as right-to-left characters
    and right-to-left characters are treated as left-to-right characters.

'natural'
    - Vertical and translating bi-orientational scripts are handled as for
      '0deg'.
    - Rotating bi-orientational scripts are oriented as either '90deg' or
      '270deg', depending on their vertical directionality.
    - All horizontal scripts are handled as for '90deg'.
    - If the element's dominant script is a vertical or bi-orientational
      script, available vertical glyph variants should be used for
      punctuation. Otherwise, horizontal glyph variants must be used,
      rotated to a '90deg' orientation.

'left'
    Same as 'natural' except horizontal text in a right-to-left block is
    handled as for '270deg' instead of '90deg'.

'right'
    Same as 'natural' except horizontal text in a left-to-right block is
    handled as for '270deg' instead of '90deg'.

'context'
    Vertical and bi-orientational scripts are handled as for 'natural'. If
    the element's dominant script is a vertical script, all punctuation is
    handled as '0deg'.

    The BIDI algorithm is applied, however reordering does not take place
    within the element. Instead,
      - In a top-to-bottom inline progression, all horizontal script
        characters in an even embedding level are rotated 90 degrees
        clockwise, and all horizontal script characters in an odd
        embedding level are rotated 90 degrees counterclockwise.
      - In a bottom-to-top inline progression, all horizontal script
        characters in an even embedding level are rotated 90 degrees
        /counterclockwise/, and all horizontal script characters in
        an odd embedding level are rotated 90 degrees clockwise.

Any other values are illegal and must be *ignored*, if possible.
http://www.w3.org/TR/REC-CSS2/syndata.html#ignore

---------------------------------------------------------------------------------

Notes on BIDI:

   The Unicode Bi-Directional Algorithm is applied to the whole text
   block not individual pieces of it. 'text-orientation' is defined
   so that characters adopt an appropriate direction within the context
   of the block. This is different from SVG, where each change in glyph
   orientation ended the BIDI algorithm's block. (SVG's approach would
   fail to handle, for example, a run of upright Latin in a 90deg-
   rotated vertical Arabic sentence.)

   The Unicode BIDI algorithm is a carefully designed algorithm for
   laying out text of mixed directionality. Since rotating text can
   effectively change its directionality, it makes sense to leverage
   this algorithm for mixed-orientation layouts as well.

Embeddings:

p {
   text-orientation-vertical: context
}
x {
   direction: rtl;
   unicode-bidi: embed;
}
character stream: <p>CHINESE <x>ARABIC 1234</x> CHINESE</p>
    tb block text: <p>LLLLLLL <x>RRRRRR LLLL</x> LLLLLLL</p>
    lr block text: <p>UUUUUUU <x>RRRRRR LLLL</x> UUUUUUU</p>

In the lr block, all text will flow top to bottom. However, Us will
be upright, Rs will be rotated 90 degrees counter clockwise and Ls
will be rotated 90 degrees clockwise. Everything reads in the right
direction, but... we need to do a 180 do read the Arabic date, which
is awkward. It would be better to have the number rotated the same
way as the Arabic, and just read from bottom to top instead of top
to bottom for this little bit.  We have a clue to help us determine
when to do this: the embedding. (The text above would not order
correctly even in a tb block unless "ARABIC 1234" was embedded.)
Therefore, we can add the following rule:

  * For elements that have a 'context' text-orientation, a 'unicode-bidi'
  * value of 'embed' will cause BIDI reordering to take effect. Specifically,
  *  If 'direction' is 'ltr' the element will behave as if it had
  *    a 'right' text-orientation.
  *  If 'direction' is 'rtl' the element will behave as if it had
  *    a 'left' text-orientation.
  * The value of 'text-orientation-vertical' will still inherit as
  * 'context'.



p {
   text-orientation-vertical: natural
}
character stream: <p>MONGOLIAN "english1 MG english2" MONGOLIAN</p>
    tb block text: <p>LLLLLLLLL "LLLLLLLL LL LLLLLLLL" LLLLLLLLL</p>
    lr block text: <p>RRRRRRRRR "LLLLLLLL RR LLLLLLLL" RRRRRRRRR</p>

For this text to display correctly, the quote needs to be embedded.
This cannot be automated, since a similar text could have 'english1'
and 'english2' be discrete phrases. Also, this complexity of script
mixing is rare, so it is reasonable to expect that the author will
either set "unicode-bidi: embed" on an element around the English
quote or choose "text-orientation-vertical: context" (which
eliminates the problem by orienting all the text to flow from top
to bottom).


p,x {
   text-orientation-vertical: 0deg
}
a {
   text-orientation-vertical: natural
}
character stream: <p>LATIN1 <a>morelatin1 <x>LATIN</x> morelatin2</a> LATIN2</p>
    tb block text: <p>LLLLLL <a>LLLLLLLLLL <x>LLLLL</x> LLLLLLLLLL</a> LLLLLL</p>
    lr block text: <p>RRRRRR <a>LLLLLLLLLL <x>RRRRR</x> LLLLLLLLLL</a> RRRRRR</p>

This presents more of a problem. There are no script differences, so
the need for an embedding is not obvious. Moreover, since the change
in directional behavior is directly caused by a style change, it is
possible for the UA to handle any necessary embeddings. Should it?

----------------------------------------------------------------------

Horizontal Text:

Michel noted that I neglected horizontal text in my writeup. This was
intentional; the title, after all, was "BIDI in Vertical Context". :)
I'm not as clear on what happens in horizontal text, particularly wrt
punctuation. But, of course, horizontal text needs to be addressed as
well.

Pretty much any script in the current Unicode repertoire comfortably
fits into a horizontal line layout. Chinese, Japanese, Korean, and Yi
all switch between horizontal and vertical lines without rotation.
An exception is Mongolian, which by the cursive nature of its script,
cannot be laid out horizontally glyph-by-glyph. It has to be rotated,
90 degrees either way. It often runs from left-to-right, but might
sometimes run right-to-left. (I think in Arabic or Hebrew contexts this
would be the preferred choice, and in some cases maybe even when by
itself, if vertical layout were not available, as this orientation
results in a layout simply rotated from the original instead of rotated
/and/ inverted.)

'text-orientation-horizontal' is mostly analogous to 'text-orientation-
vertical'. It takes the following values:

'0deg'
   All glyphs are oriented with their tops toward the block's top.
   BIDI reordering takes effect.

'180deg'
   All glyphs are oriented with their tops toward the block's bottom.
   BIDI reordering takes effect, but characters' directionalities are
   reversed.

'90deg'
   All glyphs are oriented with their tops toward the block's right edge.
   Characters are laid out from right to left. There is no reordering
   within the element.

'270deg'
   All glyphs are oriented with their tops toward the block's left edge.
   Characters are laid out from left to right. There is no reordering
   within the element.

'natural'
   Horizontal and bi-orientational scripts are handled as for '0deg'.
   Vertical scripts assigned left-to-right directionality are handled as
   for '270deg'. Vertical scripts assigned right-to-left directionality
   are handled as for '90deg'.

   If the element's dominant script is a vertical script, available
   vertical glyph variants should be used for punctuation, rotated
   appropriately. Otherwise, horizontal glyph variants must be used,
   kept at '0deg'.

'left'
   As for 'natural' except vertical scripts are always handled as for
   '270deg'.

'right'
   As for 'natural' except vertical scripts are always handled as for
   '90deg'.

'context'
   All horizontal and bi-orientational scripts are handled as for '0deg'.
   Directional characters in scripts without a definite horizontal
   directionality inherit the block's inline progression direction. So,
   if the block's inline progression goes from left to right, then
   vertical scripts are handled as for '270deg'. Otherwise, vertical
   scripts are handled as for '90deg'.

   If the element's dominant script is a vertical script, available
   vertical glyph variants should be used for punctuation, rotated
   appropriately. Otherwise, horizontal glyph variants must be used,
   kept at '0deg'.

---------------------------------------------------------------------------

Notes on text-orientation values:

Other, less common scripts may take different orientations when laid out.
Some variants of classical Yi, for example, are laid out top-to-bottom,
left-to-right and rotate to become right-to-left scripts when placed in
horizontal line layout. If a page uses special fonts to display classical
Yi--or Pictish Ogham, or anything else rare and unusual, it will need to
use explicit text-orientation values. This is why we need some values that
override normal behavior.

It is possible instead to redefine 'left' and 'right' to affect all
characters, not just horizontal scripts in vertical layout and vertical
scripts in horizontal layout.

---------------------------------------------------------------------------

Determining the Inline Progression of a Block:

Some of the "automatic" values for text-orientation need to know the
block's inline progression direction. I've already said that this is
not the same as the text's directionality, which is given by the
'direction' property. So what is it?

The inline progression of a block can be determined from its block-
progression, text-orientation, and direction values. Because
'direction' only specifies horizontal directionality, it may be
necessary to look up directionality based on the block's dominant
script (as given by 'text-script').


Here's the table for the automatic values as applied to horizontal scripts,
and the corresponding paragraph-level direction that will be used in the
BIDI algorithm:
(The inline progession for other values should be fairly obvious.)

  block-progression    text-orientation     direction   |  inline-progression
  -----------------    ----------------    -----------  |  ------------------
   top-to-bottom           natural             ltr      |   left-to-right (L)
   top-to-bottom           natural             rtl      |   right-to-left (R)
   left-to-right           natural             ltr      |   bottom-to-top (L)
   left-to-right           natural             rtl      |   top-to-bottom (R)
   right-to-left           natural             ltr      |   top-to-bottom (L)
   right-to-left           natural             rtl      |   bottom-to-top (R)
   top-to-bottom           context             ltr      |   left-to-right (L)
   top-to-bottom           context             rtl      |   right-to-left (R)
   left-to-right           context             ltr      |   top-to-bottom (R)
   left-to-right           context             rtl      |   top-to-bottom (R)
   right-to-left           context             ltr      |   top-to-bottom (L)
   right-to-left           context             rtl      |   top-to-bottom (L)

'context' is the only style in which the glyph orientation depends on the
inline progression rather than the other way around. Thus it is the only
style which does not intrinsically require a certain inline progression.
You'll notice, however, that it always chooses top-to-bottom for a vertical
flow block. This is because horizontal scripts have a bias for going from
top to bottom. (It /is/ their secondary direction, after all.)


Since 'direction' only gives the horizontal directionality, it is
necessary to look up the dominant script (as given by 'text-script')
to determine the block's inline progression for 'lr' and 'rl'
block progressions.

Most East Asian scripts
   - are bi-orientational (They behave as horizontal scripts in horizontal
     lines and as vertical scripts in vertical lines.)
   - read from top to bottom in vertical lines
   - use a translational transform between orientations rather than a rotation
Of the list in UAX 24, these scripts include Han, Hangul, Bopomofo, Katakana,
Hiragana, and Yi. As the dominant script of a vertical flow block, they give
the block a top-to-bottom inline progression for both 'natural' and 'context'
text orientations.

Mongolian
   - is a vertical script
   - reads from top to bottom in vertical lines
As with the other top-to-bottom East Asian scripts, 'natural' and 'context'
text orientations with Mongolian result in a top-to-bottom inline progression.

Ogham
   - is bi-orientational
   - reads from bottom to top in vertical lines
   - uses a *rotational* transform between orientations
As the dominant script of a vertical flow block, it gives the block a
**bottom-to-top** inline progression for both 'natural' and 'context'.

If I am not mistaken, all other scripts in UAX 24 should be classified
as horizontal. (I don't know about Canadian Aboriginal, though.)

------------------------------------------------------------------------------

Note on Vertical Scripts and Font Systems:

On font systems where fonts for vertical scripts are designed for
horizontal layout (i.e. unrotated glyphs are sideways), the actual
picture in the font will be oriented upright or upside-down in
horizontal layout and use the font's horizontal metrics. In vertical
layout it will be oriented sideways. In either case, the *form* of
the glyph follows the rules outlined above--sideways in horizontal
layout and upright in vertical layout.

------------------------------------------------------------------------------

text-orientation vs. glyph-orientation

We now have a system defined with 'block-progression', 'text-orientation',
and 'direction'. What about 'glyph-orientation'?

Michel suggested that the glyph-orientation properties should take
the role of 'text-orientation' by adding values for 'upright' and
'inline' ("context") and letting 'auto' fill in for "natural".

However, the definitions for text-orientation and glyph-orientation
don't quite match. (There's also the fact that having a property
named "glyph-orientation" reorder content instead of just rotating
glyphs is IMO just not intuitive.)

The problem with glyph-orientation is that it does not reverse characters'
directionality when it reverses their glyphs.

The advantage of glyph-orientation is that it does not reverse
directionality when it reverses glyphs.

It can be used for decorative purposes without adversely affecting the
character order of the text. So, it can be used in combination with
text-orientation for
       - weird graphical effects
       - decorative tilts
       - displaying non-standardized scripts
       - anything else that requires more precise control over glyph layout

----------------------------------------------------------------------------

text-orientation and glyph-orientation

Some rules for the interaction of glyph-orientation and text-orientation:
   1. If glyph-orientation is 'auto', the glyph's orientation
      is determined by the text-orientation value.
   2. If text-orientation is 'natural', 'glyph-orientation'
      affects the text order as specified in SVG 1.1.
   3. Otherwise, the glyph-orientation value gives the
      exact orientation of all glyphs in the element and
      has no effect on character order.

-----------------------------------------------------------------------------

Mirroring Scripts
-----------------

Ancient Egyptian hieroglyphs could be written in lines going either from
right to left or left to right. A distinctive characteristic of the lines
was that the glyphs *faced* the beginning of the line. IIRC, Egyptian is
not the only script to behave this way, and it would be best to handle
such scripts by allowing CSS to style the direction and mirror the glyphs.

'text-orientation-horizontal: mirror'
   As for 'natural', except directionality is reversed and each glyph is
   mirrored across its vertical central axis.
'glyph-orientation-horizontal: mirror'
   Each glyph is mirrored across its vertical central axis.

We can also add to 'context' -
   If the inline progression is right to left, mirroring scripts are
   handled as 'mirror'.

-----------------------------------------------------------------------------

'writing-mode'

As I have explained before[1], using 'direction' to control the inline
progression can result in strange text displays. 'writing-mode', as
it's currently defined, also sets 'direction'--which will be a problem
if 'writing-mode' comes into common use. Most of the time, the author
using "writing-mode: tb-rl" doesn't want to change the 'direction',
just the block progression. Therefore, "writing-mode: tb-rl" /shouldn't/
change the block progression. Since it says "top to bottom, right to
left", we can have this shorthand expand to
   block-progression: rl; text-orientation: context
which will do what the author wants and not affect 'direction'.

------------------------------------------------------------------------------

Appendix A: Partial Table of Script Classifications (Informative)

                                  Directionality
                         Vertical            Horizontal        Transform
-----------------------------------------------------------------------------
Latin                    none                   ltr               --
Cyrillic                 none                   ltr               --
Greek                    none                   ltr               --
Arabic                   none                   rtl               --
Hebrew                   none                   rtl               --
Devanagari               none                   ltr               --
Tibetan                  none                   ltr               --
Thai                     none                   ltr               --
Han                       tb                    ltr            translate
Hangul                    tb                    ltr            translate
Hiragana                  tb                    ltr            translate
Katakana                  tb                    ltr            translate
Yi                        tb                    ltr            translate
Hanunoo                  none                   ltr               --
Mongolian                 tb                 none (ltr)           --
Ogham                     bt                    ltr             rotate


HTML version and references provided upon request.

Acknowledgements: Thanks once again to Martin Heijdra for taking time out
                   of his busy day to discuss scripts and layout with me. :)

[1] http://lists.w3.org/Archives/Public/www-style/2003Apr/0045.htm

~fantasai

Received on Monday, 7 April 2003 11:02:21 UTC