RE: comments on Character Model for the World Wide Web: String Matching and Searching from Matitiahu Allouche on 2014-06-19 (www-international@w3.org from April to June 2014)

From: Matitiahu Allouche <matitiahu.allouche@gmail.com>
Date: Fri, 20 Jun 2014 00:25:01 +0300
To: "'Phillips, Addison'" <addison@lab126.com>, "'Asmus Freytag'" <asmusf@ix.netcom.com>, "'Najib Tounsi'" <ntounsi@emi.ac.ma>, <www-international@w3.org>
Message-ID: <0f7601cf8c04$ee803090$cb8091b0$@gmail.com>

Given the inputs from Najib and Asmus, I withdraw my comment and agree that the Arabic shapes are a more appropriate example.  However, I am not sure that the title "Cursive forms" is best. I still think that cursiveness is not the main point here. Something like "Position-dependent forms" seems better IMHO (and UAX#15 is not the ultimate truth).
--
Shalom (Regards),  Mati


-----Original Message-----
From: Phillips, Addison [mailto:addison@lab126.com] 
Sent: Thursday, June 19, 2014 10:44 PM
To: Asmus Freytag; Najib Tounsi; Matitiahu Allouche; www-international@w3.org
Subject: RE: comments on Character Model for the World Wide Web: String Matching and Searching

> 
> On 6/19/2014 11:27 AM, Najib Tounsi wrote:
> > On 6/19/14 2:51 PM, Matitiahu Allouche wrote:
> >>
> >> 11) In 2.2 table of Compatibility Equivalence, the third example is 
> >> labelled "Cursive forms". I think that this would be better 
> >> labelled "character shapes". Rationale: the example shows various 
> >> shapes of an Arabic letter. But similar examples could be taken 
> >> from final versus non-final shapes of some Hebrew letters, or from 
> >> the final versus non-final shapes of the Greek sigma letter. Hebrew 
> >> and Greek are not cursive scripts, so the issue here is having 
> >> position-dependent shapes, not cursiveness.
> 
> The Greek final sigma uses a different character code which is not a 
> compatibility equivalent.
> 
> The reason is that, unlike Arabic positional shaping, the selection of 
> the final form cannot be determined algorithmically at rendering time 
> and would otherwise introduce the need to use ZWNJ with Greek; not a good tradeoff.
> 
> Whatever example is used needs to be limited to cases of automatic 
> shape selection at rendering.
> 

Context matters here. The table is not merely one containing characters that use contextual shaping. These are *specifically* characters with compatibility decompositions in Unicode and the table is illustrating the various kinds of compatibility decomposition. I tend to agree with Mati's comment that "cursive forms" is not that accurate a label. In practice only Arabic uses <initial>, <medial>, <final>, and <isolated> decompositions, though, so the other offered examples are not what the table is meant to illustrate. The items in the table are the four compatibility variations of ARABIC LETTER NOON (U+0646).

Note that this table is identical to Figure 2 in UAX#15.

Addison

Received on Thursday, 19 June 2014 21:25:34 UTC