Re: [charmod-norm] Case Folding introduction (Section 2.1)

On 2/4/2016 1:25 AM, Martin J. Dürst wrote:
> On 2016/02/04 12:16, klensin via GitHub wrote:
>> klensin has just created a new issue for
>> https://github.com/w3c/charmod-norm:
>>
>> == Case Folding introduction (Section 2.1) ==
>> It may not be relevant (or even, by other measures, correct), but I've
>>   been beaten up several times by scholars of Arabic calligraphy who
>> have claimed by any treatment of the distinction among initial,
>> medial, final, and isolated forms as different from the distinction
>> between upper, lower (and maybe title) case reflects a European script
>>   bias and not actual relationships.
>
> I fully agree with John. I don't have any experience of being beaten
> up by experts on that point, but then only because I never even got
> the idea to make such a point.
>
> Regards,   Martin.

I've responded on the git-hub as follows:

I respectfully disagree with those scholars, and beating up people is 
not to be encouraged.

For one, in terms of digital text representation, the various positional 
forms for Arabic (or Mongolian) characters are simply different glyphs; 
they are selected by the layout engine, and not encoded separately as 
characters. (Leaving aside the compatibility characters for Arabic that 
correspond to an earlier attempt and exist as an aid for emulators and 
other types of code museums).

While there is a similarity, that in each case, around the concept of a 
"letter" there is a set of shapes that this letter can take on, "casing" 
represents of a subset: a bi-cameral script, as the name says, has two 
sets of forms for each letter, and the choice of form is not one of 
typography but of orthography, with conventions when to use each one 
that are based on the content of the text and the intent of the author.

In contrast, the positional forms for cursively connected (and similar) 
scripts are determined solely (or primarily) by the nature of the 
adjacent letters.

Also, the description in section 2.1 conforms to the definition of 
casing found elsewhere, e.g. in the Unicode Standard, and there's little 
to be gained to suddenly pretend that the term encompasses scripts that 
are not bi-cameral (but nevertheless have multiple shapes for the same 
letters).

Finally, case folding requires that there be multiple code points for 
the same letter and that ignoring that distinction is a common process 
(Hiragana and Katakana are an example of two sets of shapes for the same 
sound values, which are not customarily folded, even though all users 
know which two form the set for the given sound).
>
>

Received on Thursday, 4 February 2016 18:16:23 UTC