RE: comments on Character Model for the World Wide Web: String Matching and Searching

Given the inputs from Najib and Asmus, I withdraw my comment and agree that the Arabic shapes are a more appropriate example.  However, I am not sure that the title "Cursive forms" is best. I still think that cursiveness is not the main point here. Something like "Position-dependent forms" seems better IMHO (and UAX#15 is not the ultimate truth).
Shalom (Regards),  Mati

-----Original Message-----
From: Phillips, Addison [] 
Sent: Thursday, June 19, 2014 10:44 PM
To: Asmus Freytag; Najib Tounsi; Matitiahu Allouche;
Subject: RE: comments on Character Model for the World Wide Web: String Matching and Searching

> On 6/19/2014 11:27 AM, Najib Tounsi wrote:
> > On 6/19/14 2:51 PM, Matitiahu Allouche wrote:
> >>
> >> 11) In 2.2 table of Compatibility Equivalence, the third example is 
> >> labelled "Cursive forms". I think that this would be better 
> >> labelled "character shapes". Rationale: the example shows various 
> >> shapes of an Arabic letter. But similar examples could be taken 
> >> from final versus non-final shapes of some Hebrew letters, or from 
> >> the final versus non-final shapes of the Greek sigma letter. Hebrew 
> >> and Greek are not cursive scripts, so the issue here is having 
> >> position-dependent shapes, not cursiveness.
> The Greek final sigma uses a different character code which is not a 
> compatibility equivalent.
> The reason is that, unlike Arabic positional shaping, the selection of 
> the final form cannot be determined algorithmically at rendering time 
> and would otherwise introduce the need to use ZWNJ with Greek; not a good tradeoff.
> Whatever example is used needs to be limited to cases of automatic 
> shape selection at rendering.

Context matters here. The table is not merely one containing characters that use contextual shaping. These are *specifically* characters with compatibility decompositions in Unicode and the table is illustrating the various kinds of compatibility decomposition. I tend to agree with Mati's comment that "cursive forms" is not that accurate a label. In practice only Arabic uses <initial>, <medial>, <final>, and <isolated> decompositions, though, so the other offered examples are not what the table is meant to illustrate. The items in the table are the four compatibility variations of ARABIC LETTER NOON (U+0646).

Note that this table is identical to Figure 2 in UAX#15.


Received on Thursday, 19 June 2014 21:25:34 UTC