RE: comments on Character Model for the World Wide Web: String Matching and Searching

> On 6/19/2014 11:27 AM, Najib Tounsi wrote:
> > On 6/19/14 2:51 PM, Matitiahu Allouche wrote:
> >>
> >> 11) In 2.2 table of Compatibility Equivalence, the third example is
> >> labelled "Cursive forms". I think that this would be better labelled
> >> "character shapes". Rationale: the example shows various shapes of an
> >> Arabic letter. But similar examples could be taken from final versus
> >> non-final shapes of some Hebrew letters, or from the final versus
> >> non-final shapes of the Greek sigma letter. Hebrew and Greek are not
> >> cursive scripts, so the issue here is having position-dependent
> >> shapes, not cursiveness.
> The Greek final sigma uses a different character code which is not a
> compatibility equivalent.
> The reason is that, unlike Arabic positional shaping, the selection of the final
> form cannot be determined algorithmically at rendering time and would
> otherwise introduce the need to use ZWNJ with Greek; not a good tradeoff.
> Whatever example is used needs to be limited to cases of automatic shape
> selection at rendering.

Context matters here. The table is not merely one containing characters that use contextual shaping. These are *specifically* characters with compatibility decompositions in Unicode and the table is illustrating the various kinds of compatibility decomposition. I tend to agree with Mati's comment that "cursive forms" is not that accurate a label. In practice only Arabic uses <initial>, <medial>, <final>, and <isolated> decompositions, though, so the other offered examples are not what the table is meant to illustrate. The items in the table are the four compatibility variations of ARABIC LETTER NOON (U+0646).

Note that this table is identical to Figure 2 in UAX#15.


Received on Thursday, 19 June 2014 19:44:58 UTC