Re: ligature formation across text chunks from Cameron McCormack on 2011-05-17 (public-svg-wg@w3.org from April to June 2011)

From: Cameron McCormack <cam@mcc.id.au>
Date: Wed, 18 May 2011 11:46:03 +1200
To: Glenn Adams <glenn@skynav.com>
Cc: Vincent Hardy <vhardy@adobe.com>, "public-svg-wg@w3.org" <public-svg-wg@w3.org>
Message-ID: <20110517234603.GB3424@wok.mcc.id.au>
Hi Glenn.

Glenn Adams:
> Perhaps I wasn't clear, but in this example, see Example #1 in [1], 12
> devanagari Unicode characters map to 4 glyphs in a devanagari font, where
> the:
> 
>    - the 1st glyph, a base glyph denoting a vowel, is derived from the char
>    at index 10
>    - the 2nd glyph, a base glyph denoting the half form of a nukta-ized
>    consonant, is derived from chars at index 2, 3, and 4
>    - the 3rd glyph, a base glyph denoting a ligature of three consonants, is
>    derived from chars at index 5, 6, 7, 8, and 9
>    - the 4th glyph, a combining glyph denoting a ligature of two combining
>    marks, is derived from chars at index 0, 1, and 11

OK.  So the difference here from the fourth dot point in the spec I
referred to earlier is that in the spec it describes a many-to-many
character-to-glyph mapping without considering reordering of the glyphs.
Whereas in this example, we can’t simply partition the characters to
their corresponding glyphs without reordering them.

I think the spec *could* be extended to note this case, and say
something like: position of each glyph is determined by the first x=""
value in document order that corresponds to that glyph.  If we have four
glyphs, then you could do this:

  <text x="40 0 0 20 0 0 30 0 0 0 0 10">zzzzzzzzzzzz</text>

Would that fourth glyph, which is a combining glyph, be considered a
seaprate glyph for positioning though?  I think the spec is clear that
for cases like

  <text x="10 20">a&#x308;</text>

that the “a” is placed at x = 10 and the combining mark is placed above
the “a” and not at x = 20.  What determines that the fourth glyph in the
example is treated as a combining glyph?  Especially if that glyph
doesn’t correspond to a Unicode character?

> At a minimum, an author should be able to perform the character to glyph
> mapping (and origin assignment) at authoring time, and use SVG to display
> the four glyphs at their desired locations.

I think I am still a bit confused: in our example, are we still using
the implementation’s layout engine to determine the glyphs and their
order that come out from the sequence of characters in the document?  If
so, then the above x="" attribute should work, since we (the author, or
tool generating the markup) know and the implementation knows how those
characters are going to map to glyphs and the order they’ll appear in –
it’s just the exact position of these glyphs that we want to specify.

But if we don’t rely on the implementations layout engine, then we the
author do need to specify how those characters map to glyphs, since the
implementation cannot know that the 0th, 2nd, 5th and 10th characters
are the first ones in document order that correspond to the four glyphs.
That would allow us to keep using x="", which goes based on characters,
but I can see that it would be simpler to have a glyph-x="" that went by
ordered glyphs instead.

> Since the glyph identifiers (glyph codes) used to refer to these
> glyphs are not in Unicode (because these are glyphs and not chars),
> then some means other than Unicode characters should be available
> to serve this function. That is my understanding of purpose of
> <glyphRef/>.

Assuming we’re not using the implementation’s text shaping engine, you
would want to output (assuming we’re using SVG fonts):

  <font>
    <glyph id="g1" d="..."/>
    <glyph id="g2" d="..."/>
    <glyph id="g3" d="..."/>
    <glyph id="g4" d="..."/>
  </font>
  <altGlyphDef id="a">
    <glyphRef xlink:href="g1"/>
    <glyphRef xlink:href="g2"/>
    <glyphRef xlink:href="g3"/>
    <glyphRef xlink:href="g4"/>
  </altGlyphDef>
  <text glyph-x="10 20 30 40"><altGlyph
    xlink:href="#a">zzzzzzzzzzzz</altGlyph></text>

is that right?  The glyph-x="" gives the position of each glyph (or we
could keep using x="").  We still don’t have the mapping of character
indexes to glyph indexes, which would be necessary for text selection to
work properly.  The glyphs would be positioned correctly, though,
because we just ensure that the <glyphRef>s are in the order glyph-x
expects.

> Further, the authoring tool may wish to output these glyphs in an order that
> is distinct from either the original logical character order or from the
> visual order. For example, it is most convenient to interchange the 3rd and
> 4th glyphs, so that the combining mark can be assigned an advancement of 0,
> and followed by the base glyph whose origin is derived from the advancement
> and origin of the prior (2nd) base glyph. Otherwise, the origin of the
> combining glyph would need to take into account the advancement of the base
> glyph on which it is to be attached.

So you would want to be allowed to output the <glyphRef> elements in any
order (and then correspondingly order the glyph-x="" attribute’s values).

> Now, it appears that <text/>, etc., are defined to operate in the character
> domain, and not the glyph domain. However, properties such as x, y, dx, dy,
> rotation, are clearly applicable to the glyph domain, and not the character
> domain. As long as one works with  simple writing systems that have a
> general 1:1 character to glyph mapping, then this doesn't appear to be a
> problem. However, in complex scripts, such as Arabic, and the family of
> Indic scripts, this is a rather serious problem of mixing apples and
> oranges, since a 1:1 mapping is the exception, not the rule.

I think m:n mappings still work with character indexes, but only if the
characters that correspond to a given glyph are contiguous.

> However, perhaps I am missing something here, so the problem may merely be
> my lack of full understanding of the defined mechanisms.

Likely the opposite.

> One reason I am particularly interested in this at the moment is that I am
> in the process of adding Complex Script Support [2] to the Apache FOP
> Project [3], an implementation of XSL-FO, and in due course, I need to
> output glyphs with x and y origin offsets, and x and y advancements, on a
> per glyph basis. I already have this process working for PDF output, but SVG
> is on my list for output support, so I shall presently have to deal with
> SVG.

OK.

> Whether SVG can natively support both the necessary bidi processing and the
> complex character to glyph mapping (via opentype advanced typography tables
> or equivalent in truetype) remains a question. But that is the problem you
> have if you want SVG to handle character to glyph mapping even at line
> level. Note that in general, bidi processing operates first in the character
> domain on an entire paragraph, then secondly in the glyph domain (after line
> breaking) when reordering bidi embedding level segments. Note also that line
> breaking generally operates on the glyph domain, since characters have no
> geometry (only glyphs do).

Right.  (I have just been becoming more familiar with bidi layout over
the last week or so.)

-- 
Cameron McCormack ≝ http://mcc.id.au/
Received on Tuesday, 17 May 2011 23:46:40 UTC