Re: Armenian numbering: findings, recommendations and request to CSS from Thomas Phinney on 2009-02-15 (www-style@w3.org from February 2009)

From: Thomas Phinney <tphinney@cal.berkeley.edu>
Date: Sun, 15 Feb 2009 13:06:21 -0800
To: Leif Halvard Silli <lhs@malform.no>
Cc: Robert J Burns <rob@robburns.com>, W3C Style List <www-style@w3.org>, fantasai <fantasai.lists@inkedblade.net>
Message-ID: <f49ae6ac0902151306q733f2a3ema71ac183640309b0@mail.gmail.com>
Just trying to clarify differences between collation and
numbering-using-letters:

Alphabetical collation is about taking a list and putting it in
alphabetical order according to the text of each item.
Alphabetic-style numbering-using-letters is about assigning letters to
list items which are in an arbitrary order, with no necessary relation
to the beginning letters of the text of each item.

Collation requires rules for processing all possible characters of an
item, including those which can't occur at the beginning of a word,
information about characters which should be ignored for collation,
etc. Numbering-using-letters can ignore those factors.

As long as the various languages used by a given writing system don't
actually have conflicting alphabetization rules, collation can
usefully use ordering and rules which are a superset of those required
by any single language. Numbering-using-letters is entirely
language-specific.

Cheers,

T


On Sun, Feb 15, 2009 at 11:58 AM, Leif Halvard Silli <lhs@malform.no> wrote:
> Robert J Burns 2009-02-14 23.43:
>>
>> On Feb 14, 2009, at 3:47 PM, Leif Halvard Silli wrote:
>>>
>>> Robert J Burns 2009-02-14 00.25:
>>>>
>>>> Hi Leif and fantasai,
>
>>> In a summary, we are juggling with 4 things:
>>>
>>>    Alphabet: The particualr alphabet in question
>>>    A: classic systems (Roman, Armenian, classical Greek, classical Church
>>> slavonic, Georgian) based on letters in place of numbers.
>>>    B: alphabetical collation
>>>    C: hybrids: saying a, b, c instead of 1, 2, 3
>>
>> [...] I'm less clear about the distinction between B and C. [...]
>
>>> E.g for the Russian Cyrillic alphabet, C constitutes as shorter version
>>> of that alphabet, than B does. {And B), in turn, constitutes as shorter
>>> version of the Russian Cyrillic alphabet.}
>>
>> I'm still not understanding the B/C distinction. I think the B and C
>> distinction then is more about this limiting of subsets to specific
>> languages.   Of course there will be some scripts where the language and
>> script are reduced to the same thing (like Hebrew) and also where there is
>> no upper and lower case distinction (e.g., Arabic, and also Hebrew again).
>
> The B/C distinction is defined culturally. Let's look at Russian.
>
> Russian
> Alphabet, Collation, Listing:
>   [...],   [...],   [...].
>     Е,       Е,       Е.
>     Ё,       Ё,       –.
>   [...],   [...],   [...].
>     И,       И,       И.
>     Й,       Й,       –.
>   [...],   [...],   [...].
>     Ъ,       –,       –.
>     Ы,       Ы,       –.
>     Ь,       –,       –.
>   [...],   [...],   [...].
>
> The Ъ and Ь are non-voiced and only modify the preceding letters - hence
> they play no role as first-letters whether in collation or in the derived
>  listing/outline format. The Ё perhaps looks too similar to the Е and was
> also in general writing for a long time demoted in favour of E. Dropping Ё
> also The Й look very similar to Й and is, as a first-letter, only used in
> loan words. And also Ы never appears as first-letter except in loan words.
> (But the Ш and the Щ also look similar, but are still kept in listing,
> probably because they both can be first-letters.)
>
> In theory, there is no problem using the full alphabet, including  the
> non-voiced letters, for the purpose of list-enumeration. After all,
> upper-norwegian includes Q,X,W,Z, even if they are considered exotic letters
> in Norwegian. But in practise, the Russian enumeration format reminds of
> German, where "Ä, Ö, Ü" (and ß) are not part of first-letter collations of
> native German names and words. There is "something extra" that excludes some
> of the German as well as Russian letters, from being used in lists.
>
>> In the case of Armenian being identical for the first 10 letter, that may
>> be true, but that's just a particular coincidence. The two systems are still
>> fundamentally different between Armenian numeric (A) on on hand and Armenian
>> alphabetic (B/C) on the other.
>
> Right.
>
>> So I would say we're really dealing with two situations: letter-based
>> numeral systems and alphabetic (more broadly in Unicode terminology:
>> lettered) enumerations.
>
> OK, well, of course. There are two basic kinds: letter-based numeral systems
> and lettered enumerations. But of the the latter kind, there are two kinds,
> as well:
>
> 1) Strictly alphabetic collations based formats, which are rather simple to
> document as real and existing (but which *may often not* be that much used
> in academical and other works that typically use lettered enumeration).
>
> 2) Culturally defined lettered enumeration, which is documented by their use
> in academical works (hence LaTeX etc) and also sometimes defined by standard
> commitees etc.
>
> One needs to be aware of both variants, in order to get things right. (This
> task is perhaps somewhat simplified for Latin, because of the strong
> position of the Basic Modern Latin alphabet.[1])
>
> [1] http://en.wikipedia.org/wiki/Basic_modern_Latin_alphabet
> --
> leif halvard silli
>
>
Received on Monday, 16 February 2009 08:58:22 UTC