W3C home > Mailing lists > Public > www-international@w3.org > April to June 2014

Re: comments on Character Model for the World Wide Web: String Matching and Searching

From: Asmus Freytag <asmusf@ix.netcom.com>
Date: Thu, 19 Jun 2014 15:11:15 -0700
Message-ID: <53A36003.20304@ix.netcom.com>
To: Matitiahu Allouche <matitiahu.allouche@gmail.com>, "'Phillips, Addison'" <addison@lab126.com>, 'Najib Tounsi' <ntounsi@emi.ac.ma>, www-international@w3.org
On 6/19/2014 2:25 PM, Matitiahu Allouche wrote:
> Given the inputs from Najib and Asmus, I withdraw my comment and agree that the Arabic shapes are a more appropriate example.  However, I am not sure that the title "Cursive forms" is best. I still think that cursiveness is not the main point here. Something like "Position-dependent forms" seems better IMHO (and UAX#15 is not the ultimate truth).
> --

The ultimate truth is that shape selection for Arabic (high end) is 
something much more complicated than deciding among four positional 
variants. Codes for these variants were forced into Unicode by the need 
to handle interoperation with then existing character sets - I wonder 
when the time comes that these forms can be unequivocally treated as 

(Not all compatibility characters are equal in that regard, some 
continue to be needed in for new documents: they represent a distinction 
that is needed in some usage, often technical)

> Shalom (Regards),  Mati
> -----Original Message-----
> From: Phillips, Addison [mailto:addison@lab126.com]
> Sent: Thursday, June 19, 2014 10:44 PM
> To: Asmus Freytag; Najib Tounsi; Matitiahu Allouche; www-international@w3.org
> Subject: RE: comments on Character Model for the World Wide Web: String Matching and Searching
>> On 6/19/2014 11:27 AM, Najib Tounsi wrote:
>>> On 6/19/14 2:51 PM, Matitiahu Allouche wrote:
>>>> 11) In 2.2 table of Compatibility Equivalence, the third example is
>>>> labelled "Cursive forms". I think that this would be better
>>>> labelled "character shapes". Rationale: the example shows various
>>>> shapes of an Arabic letter. But similar examples could be taken
>>>> from final versus non-final shapes of some Hebrew letters, or from
>>>> the final versus non-final shapes of the Greek sigma letter. Hebrew
>>>> and Greek are not cursive scripts, so the issue here is having
>>>> position-dependent shapes, not cursiveness.
>> The Greek final sigma uses a different character code which is not a
>> compatibility equivalent.
>> The reason is that, unlike Arabic positional shaping, the selection of
>> the final form cannot be determined algorithmically at rendering time
>> and would otherwise introduce the need to use ZWNJ with Greek; not a good tradeoff.
>> Whatever example is used needs to be limited to cases of automatic
>> shape selection at rendering.
> Context matters here. The table is not merely one containing characters that use contextual shaping. These are *specifically* characters with compatibility decompositions in Unicode and the table is illustrating the various kinds of compatibility decomposition. I tend to agree with Mati's comment that "cursive forms" is not that accurate a label. In practice only Arabic uses <initial>, <medial>, <final>, and <isolated> decompositions, though, so the other offered examples are not what the table is meant to illustrate. The items in the table are the four compatibility variations of ARABIC LETTER NOON (U+0646).
> Note that this table is identical to Figure 2 in UAX#15.
> Addison
Received on Thursday, 19 June 2014 22:11:27 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 22:41:05 UTC