Re: ligature formation across text chunks from Glenn Adams on 2011-05-17 (public-svg-wg@w3.org from April to June 2011)

From: Glenn Adams <glenn@skynav.com>
Date: Mon, 16 May 2011 23:05:42 -0600
To: Cameron McCormack <cam@mcc.id.au>
Cc: Vincent Hardy <vhardy@adobe.com>, "public-svg-wg@w3.org" <public-svg-wg@w3.org>
Message-ID: <BANLkTi=jOFFWhKYW63HuSWbFgFjHp_u9fg@mail.gmail.com>
Cameron,

Perhaps I wasn't clear, but in this example, see Example #1 in [1], 12
devanagari Unicode characters map to 4 glyphs in a devanagari font, where
the:

   - the 1st glyph, a base glyph denoting a vowel, is derived from the char
   at index 10
   - the 2nd glyph, a base glyph denoting the half form of a nukta-ized
   consonant, is derived from chars at index 2, 3, and 4
   - the 3rd glyph, a base glyph denoting a ligature of three consonants, is
   derived from chars at index 5, 6, 7, 8, and 9
   - the 4th glyph, a combining glyph denoting a ligature of two combining
   marks, is derived from chars at index 0, 1, and 11

At a minimum, an author should be able to perform the character to glyph
mapping (and origin assignment) at authoring time, and use SVG to display
the four glyphs at their desired locations. Since the glyph identifiers
(glyph codes) used to refer to these glyphs are not in Unicode (because
these are glyphs and not chars), then some means other than Unicode
characters should be available to serve this function. That is my
understanding of purpose of <glyphRef/>.

Further, the authoring tool may wish to output these glyphs in an order that
is distinct from either the original logical character order or from the
visual order. For example, it is most convenient to interchange the 3rd and
4th glyphs, so that the combining mark can be assigned an advancement of 0,
and followed by the base glyph whose origin is derived from the advancement
and origin of the prior (2nd) base glyph. Otherwise, the origin of the
combining glyph would need to take into account the advancement of the base
glyph on which it is to be attached.

Now, it appears that <text/>, etc., are defined to operate in the character
domain, and not the glyph domain. However, properties such as x, y, dx, dy,
rotation, are clearly applicable to the glyph domain, and not the character
domain. As long as one works with  simple writing systems that have a
general 1:1 character to glyph mapping, then this doesn't appear to be a
problem. However, in complex scripts, such as Arabic, and the family of
Indic scripts, this is a rather serious problem of mixing apples and
oranges, since a 1:1 mapping is the exception, not the rule.

However, perhaps I am missing something here, so the problem may merely be
my lack of full understanding of the defined mechanisms.

One reason I am particularly interested in this at the moment is that I am
in the process of adding Complex Script Support [2] to the Apache FOP
Project [3], an implementation of XSL-FO, and in due course, I need to
output glyphs with x and y origin offsets, and x and y advancements, on a
per glyph basis. I already have this process working for PDF output, but SVG
is on my list for output support, so I shall presently have to deal with
SVG.

Whether SVG can natively support both the necessary bidi processing and the
complex character to glyph mapping (via opentype advanced typography tables
or equivalent in truetype) remains a question. But that is the problem you
have if you want SVG to handle character to glyph mapping even at line
level. Note that in general, bidi processing operates first in the character
domain on an entire paragraph, then secondly in the glyph domain (after line
breaking) when reordering bidi embedding level segments. Note also that line
breaking generally operates on the glyph domain, since characters have no
geometry (only glyphs do).

Regards,
Glenn

[1] http://www.microsoft.com/typography/otfntdev/devanot/features.aspx
[2] http://skynav.trac.cvsdude.com/fop/wiki/ComplexScripts
[3] http://xmlgraphics.apache.org/fop/

On Mon, May 16, 2011 at 10:01 PM, Cameron McCormack <cam@mcc.id.au> wrote:

> Glenn Adams:
> > I probably shouldn't throw this in, but I wonder how these semantics
> would
> > handle situations like Example 1 under "Examples [of] Devanagari
> syllables"
> > in [1], where a sequence of 12 Unicode characters maps to a sequence of 4
> > glyphs, and where the inverse association from glyph to generating
> character
> > indices are as follows:
> >
> > Glyph  Char
> > Index  Indices
> >
> > 0   <- {10}
> > 1   <- {2,3,4}
> > 2   <- {5,6,7,8,9}
> > 3   <- {0,1,11}
> >
> > The semantics of associating x offsets (and similar properties) with
> > characters as opposed to glyphs seems rather disconnected to me.
>
> What I would expect from the above example is that if you have
>
>  <text x="10 20 30 40 50 60 70 80 90 100 110 120 130 140">
>    azzzzzzzzzzzzb
>  </text>
>
> where the zs are the 12 characters mapping to the single glyph that you
> mention, then you would get the “a” glyph at x = 10, the Devanagari
> glyph at x = 20 and the b glyph at x = 140.  That’s what the rules in
> http://www.w3.org/TR/SVG/text.html#TSpanElement say to do (the third
> bullet beneath “The following additional rules apply …”).
>
> > One can certainly talk about associating x offsets with the output
> > glyphs, but attempting to associate such properties with input
> > characters that may be subjected to a complex, non-continuous,
> > disjoint mapping to glyphs seems questionable, except in the special
> > case of 1:1 continuous mappings.
>
> Yes, the x="" attribute there shouldn’t break up the complex glyph.
>
> --
> Cameron McCormack ≝ http://mcc.id.au/
>
Received on Tuesday, 17 May 2011 05:06:30 UTC