RE: comments on Character Model for the World Wide Web: String Matching and Searching from Phillips, Addison on 2014-06-19 (www-international@w3.org from April to June 2014)

From: Phillips, Addison <addison@lab126.com>
Date: Thu, 19 Jun 2014 19:44:27 +0000
To: Asmus Freytag <asmusf@ix.netcom.com>, Najib Tounsi <ntounsi@emi.ac.ma>, Matitiahu Allouche <matitiahu.allouche@gmail.com>, "www-international@w3.org" <www-international@w3.org>
Message-ID: <7C0AF84C6D560544A17DDDEB68A9DFB5246FC765@ex10-mbx-36009.ant.amazon.com>

> 
> On 6/19/2014 11:27 AM, Najib Tounsi wrote:
> > On 6/19/14 2:51 PM, Matitiahu Allouche wrote:
> >>
> >> 11) In 2.2 table of Compatibility Equivalence, the third example is
> >> labelled "Cursive forms". I think that this would be better labelled
> >> "character shapes". Rationale: the example shows various shapes of an
> >> Arabic letter. But similar examples could be taken from final versus
> >> non-final shapes of some Hebrew letters, or from the final versus
> >> non-final shapes of the Greek sigma letter. Hebrew and Greek are not
> >> cursive scripts, so the issue here is having position-dependent
> >> shapes, not cursiveness.
> 
> The Greek final sigma uses a different character code which is not a
> compatibility equivalent.
> 
> The reason is that, unlike Arabic positional shaping, the selection of the final
> form cannot be determined algorithmically at rendering time and would
> otherwise introduce the need to use ZWNJ with Greek; not a good tradeoff.
> 
> Whatever example is used needs to be limited to cases of automatic shape
> selection at rendering.
> 

Context matters here. The table is not merely one containing characters that use contextual shaping. These are *specifically* characters with compatibility decompositions in Unicode and the table is illustrating the various kinds of compatibility decomposition. I tend to agree with Mati's comment that "cursive forms" is not that accurate a label. In practice only Arabic uses <initial>, <medial>, <final>, and <isolated> decompositions, though, so the other offered examples are not what the table is meant to illustrate. The items in the table are the four compatibility variations of ARABIC LETTER NOON (U+0646).

Note that this table is identical to Figure 2 in UAX#15.

Addison

Received on Thursday, 19 June 2014 19:44:58 UTC