Re: Armenian numbering: findings, recommendations and request to CSS from Robert J Burns on 2009-02-15 (www-style@w3.org from February 2009)

From: Robert J Burns <rob@robburns.com>
Date: Sun, 15 Feb 2009 15:49:19 -0600
To: Thomas Phinney <tphinney@cal.berkeley.edu>
Cc: Leif Halvard Silli <lhs@malform.no>, W3C Style List <www-style@w3.org>, fantasai <fantasai.lists@inkedblade.net>
Message-Id: <09F54B65-9937-40BF-907C-FFA9D9EA23E6@robburns.com>
HI Thomas and Leif,

Thanks for that explanation that is much clearer to me now. I do think  
the previous Unicode sources I cited could still be of some use. They  
do provide language specific collating, but not language specific  
character sets (nor language specific collation nor  listing character  
sets)

My thinking is that if CSS can reference Unicode where ever possible  
it simplifies the work required for the CSS WGs to accomplish. For  
that matter it might make some sense for this character set data (sets  
of language-specific characters for alphabet, collation and listing to  
reside in Unicode's CLDR, though as I said it is not in there yet as  
far as I know).

Take care,
Rob

On Feb 15, 2009, at 3:06 PM, Thomas Phinney wrote:

> Just trying to clarify differences between collation and
> numbering-using-letters:
>
> Alphabetical collation is about taking a list and putting it in
> alphabetical order according to the text of each item.
> Alphabetic-style numbering-using-letters is about assigning letters to
> list items which are in an arbitrary order, with no necessary relation
> to the beginning letters of the text of each item.
>
> Collation requires rules for processing all possible characters of an
> item, including those which can't occur at the beginning of a word,
> information about characters which should be ignored for collation,
> etc. Numbering-using-letters can ignore those factors.
>
> As long as the various languages used by a given writing system don't
> actually have conflicting alphabetization rules, collation can
> usefully use ordering and rules which are a superset of those required
> by any single language. Numbering-using-letters is entirely
> language-specific.
>
> Cheers,
>
> T
>
>
> On Sun, Feb 15, 2009 at 11:58 AM, Leif Halvard Silli  
> <lhs@malform.no> wrote:
>> Robert J Burns 2009-02-14 23.43:
>>>
>>> On Feb 14, 2009, at 3:47 PM, Leif Halvard Silli wrote:
>>>>
>>>> Robert J Burns 2009-02-14 00.25:
>>>>>
>>>>> Hi Leif and fantasai,
>>
>>>> In a summary, we are juggling with 4 things:
>>>>
>>>>   Alphabet: The particualr alphabet in question
>>>>   A: classic systems (Roman, Armenian, classical Greek, classical  
>>>> Church
>>>> slavonic, Georgian) based on letters in place of numbers.
>>>>   B: alphabetical collation
>>>>   C: hybrids: saying a, b, c instead of 1, 2, 3
>>>
>>> [...] I'm less clear about the distinction between B and C. [...]
>>
>>>> E.g for the Russian Cyrillic alphabet, C constitutes as shorter  
>>>> version
>>>> of that alphabet, than B does. {And B), in turn, constitutes as  
>>>> shorter
>>>> version of the Russian Cyrillic alphabet.}
>>>
>>> I'm still not understanding the B/C distinction. I think the B and C
>>> distinction then is more about this limiting of subsets to specific
>>> languages.   Of course there will be some scripts where the  
>>> language and
>>> script are reduced to the same thing (like Hebrew) and also where  
>>> there is
>>> no upper and lower case distinction (e.g., Arabic, and also Hebrew  
>>> again).
>>
>> The B/C distinction is defined culturally. Let's look at Russian.
>>
>> Russian
>> Alphabet, Collation, Listing:
>>  [...],   [...],   [...].
>>    Е,       Е,       Е.
>>    Ё,       Ё,       –.
>>  [...],   [...],   [...].
>>    И,       И,       И.
>>    Й,       Й,       –.
>>  [...],   [...],   [...].
>>    Ъ,       –,       –.
>>    Ы,       Ы,       –.
>>    Ь,       –,       –.
>>  [...],   [...],   [...].
>>
>> The Ъ and Ь are non-voiced and only modify the preceding letters -  
>> hence
>> they play no role as first-letters whether in collation or in the  
>> derived
>> listing/outline format. The Ё perhaps looks too similar to the Е  
>> and was
>> also in general writing for a long time demoted in favour of E.  
>> Dropping Ё
>> also The Й look very similar to Й and is, as a first-letter, only  
>> used in
>> loan words. And also Ы never appears as first-letter except in loan  
>> words.
>> (But the Ш and the Щ also look similar, but are still kept in  
>> listing,
>> probably because they both can be first-letters.)
>>
>> In theory, there is no problem using the full alphabet, including   
>> the
>> non-voiced letters, for the purpose of list-enumeration. After all,
>> upper-norwegian includes Q,X,W,Z, even if they are considered  
>> exotic letters
>> in Norwegian. But in practise, the Russian enumeration format  
>> reminds of
>> German, where "Ä, Ö, Ü" (and ß) are not part of first-letter  
>> collations of
>> native German names and words. There is "something extra" that  
>> excludes some
>> of the German as well as Russian letters, from being used in lists.
>>
>>> In the case of Armenian being identical for the first 10 letter,  
>>> that may
>>> be true, but that's just a particular coincidence. The two systems  
>>> are still
>>> fundamentally different between Armenian numeric (A) on on hand  
>>> and Armenian
>>> alphabetic (B/C) on the other.
>>
>> Right.
>>
>>> So I would say we're really dealing with two situations: letter- 
>>> based
>>> numeral systems and alphabetic (more broadly in Unicode terminology:
>>> lettered) enumerations.
>>
>> OK, well, of course. There are two basic kinds: letter-based  
>> numeral systems
>> and lettered enumerations. But of the the latter kind, there are  
>> two kinds,
>> as well:
>>
>> 1) Strictly alphabetic collations based formats, which are rather  
>> simple to
>> document as real and existing (but which *may often not* be that  
>> much used
>> in academical and other works that typically use lettered  
>> enumeration).
>>
>> 2) Culturally defined lettered enumeration, which is documented by  
>> their use
>> in academical works (hence LaTeX etc) and also sometimes defined by  
>> standard
>> commitees etc.
>>
>> One needs to be aware of both variants, in order to get things  
>> right. (This
>> task is perhaps somewhat simplified for Latin, because of the strong
>> position of the Basic Modern Latin alphabet.[1])
>>
>> [1] http://en.wikipedia.org/wiki/Basic_modern_Latin_alphabet
>> --
>> leif halvard silli
>>
>>
Received on Sunday, 15 February 2009 21:50:08 UTC