Re: Role of the font in deciding the cluster boundaries

Richard Ishida scripsit:

> U+0928 DEVANAGARI LETTER NA
> U+094D DEVANAGARI SIGN VIRAMA
> U+0926 DEVANAGARI LETTER DA
> U+0940 DEVANAGARI VOWEL SIGN II
> (this is NOT a grapheme cluster - it's two)

Absolutely, and normally it will be rendered as a single orthographic
syllable.  However, if a font has neither an N-D conjunct nor a half-form
full N, it may render this as two orthographic syllables: N with a visible
virama, and full D with II, just as if it had been encoded NA + VIRAMA +
ZWNJ + DA + II.

That's not normal for Devanagari, but it is possible: see the first
paragraph of TUS 6.2 p. 282 (physical page 7 of the PDF for Chapter 9).
In Tamil, it's normal, and in Malayalam, it's orthography-dependent
(old vs. new).  So just looking at the characters doesn't allow you to
tell how many orthographic syllables (as opposed to extended grapheme
clusters) are in use.

Therefore, if you have a feature that depends on the orthographic
syllable, you'll need, in the general case, to ask the font how many
syllables you have.  Now which features actually *do* depend on orthographic
syllables is a different question on which I can shed no light.

-- 
I suggest you solicit aid of my followers       John Cowan
or learn the difficult art of mud-breathing.    cowan@ccil.org
        --Great-Souled Sam                      http://www.ccil.org/~cowan

Received on Friday, 14 March 2014 04:52:31 UTC