RE: Ligatures and caret position

This is a personal response.

For text editing, then caret movement (and text selection) usually are done at the "grapheme cluster" level. See Character Model: Fundamentals, section 6.1, the "Note" at the bottom. Note that this may need to be tailored on a per-language basis.

Basically, a grapheme cluster and a ligature are separate things. In your example, as the user move the caret to the right in "donkey", the "nke" ligature would be broken and re-rendered with the caret's "glyph" between the 'n' and the 'k'. This is logical for that script (Latin) and the English language (it might not be true in another language, consider the one given in CharMod: the letter 'ch' in Serbian).

If the "ligature" is really a character followed by one or more combining marks (for example, 'e' followed by U+0301 or the Thai sequence U+0E01 U+0E33), then the caret jumps over the complete sequence (or the selection selects the complete sequence) on the first "right arrow".

Note that there are cases (Arabic) where you will not wish to actually break ligation when moving the caret.

Hope this helps. No doubt a more complete and well-considered response will come from the WG shortly.

Best Regards,


Addison P. Phillips
Globalization Architect, Quest Software
Chair, W3C Internationalization Core Working Group

Internationalization is not a feature.
It is an architecture. 
> -----Original Message-----
> From: [mailto:public-i18n-core-
>] On Behalf Of Robin Berjon
> Sent: 2005年11月2日 6:35
> To:
> Cc: SVG WG
> Subject: Ligatures and caret position
> Dear I18N Core WG,
> I am writing to you on behalf of the SVG WG. We have added support
> for editable text to the SVG Tiny 1.2 specification, which entails
> that during edition, a caret would typically be available (in visual
> media) for the user to know the current editing position in the text.
> Since we also support ligatures (including in SVG fonts), we were
> wondering how the caret should interact with them.
> Let's assume that the user is editing the word "donkey", and that the
> "nke" characters were ligatured so as to render as an "X", giving us
> "doXy".
> The user has the caret after the "o", like so: do|Xy. If she moves
> the caret one step further (in this case, to the right), what should
> happen? We can see several options:
>   - the caret moves straight to after the glyph: doX|y. This is
> unintuitive for, say, an "fi" ligature, but may be the best option
> for some scripts.
>   - same as the above, but script/language dependent so that it's
> (almost) always intuitive.
>   - the caret doesn't move until it has been advanced (say, using the
> right arrow key) as many times as there are characters in the
> ligature. For instance, if the caret would conceptually be on "don|
> key" or "donk|ey" the rendering would still be "do|Xy". We believe
> that this is not a very intuitive option.
>   - each time the caret is advanced, it progresses by the advance of
> the ligature glyph divided by the number of characters. This has the
> advantage that the user gets feedback for her actions, but we suspect
> that the result may be completely wrong in some situations.
>   - we leave it completely up to implementations, hoping they'll get
> it right.
> Note that everywhere in the above that concerns itself with advancing
> the caret would be expected to work conversely when it is moving
> backwards as well.
> We would greatly appreciate your guidance in this issue. Thanks!
> --
> Robin Berjon
>     Senior Research Scientist
>     Expway,


Received on Wednesday, 2 November 2005 15:40:54 UTC