Re: Armenian numbering: findings, recommendations and request to CSS from Robert J Burns on 2009-02-14 (www-style@w3.org from February 2009)

From: Robert J Burns <rob@robburns.com>
Date: Sat, 14 Feb 2009 16:43:25 -0600
To: Leif Halvard Silli <lhs@malform.no>
Cc: W3C Style List <www-style@w3.org>, fantasai <fantasai.lists@inkedblade.net>
Message-Id: <2650F815-2879-4E44-BE81-02B6493F10CB@robburns.com>
Hi Leif,

On Feb 14, 2009, at 3:47 PM, Leif Halvard Silli wrote:

> Robert J Burns 2009-02-14 00.25:
>> Hi Leif and fantasai,
>
>> So that means all CSS needs to do is limit the specific letters  
>> used in an alphabetic (or more precisely in Unicode terms a  
>> lettered) enumeration. Unicode provides the rest.   However as Leif  
>> suggested before
>> some general naming scheme might be needed to allow some way to  
>> express this in a functional form as fantasai suggested (something  
>> to serve as the argument/arguments for an "alpha" or "lettered"  
>> function). So for example lettered(latn-no) or alpha(latn-no) could  
>> indicate the Latin script limited to Norwegian letters and sorted  
>> according to the Unicode Norwegian collation.
>
> Sounds fine with me with alpha(latn-no). But I don't know if upper- 
> case/lower-case fits in in such a system.

Yes, it occurred to me I forgot that. I think simply inserting upper  
or lower in there would make sense. so alpha(latn-no-upper) or  
alpha(latn-no-lower) would work.

>> The same thing could be accomplished for any Ethiopic based  
>> langauge (unless there's something else I'm missing there).
>
> If there is one thing that should be relatively easy to document and  
> agree about, then it is about alphabetical collation. Especially so  
> if such collation is documented in Unicode - I am not familiar with  
> that subject.

I was referring to the Unicode Collation Algorithm[1] and the  
associated Unicode Common Locale Data Repository[2]. However, I think  
the part that hasn't been done by Unicode is documenting the subset of  
characters within each script relevant to a particular language using  
that script. There are also many letters within each script that are  
specialized letters that probably don't belong in an lettered  
enumeration. So much as CSS3 has already been doing for the list  
module, that would need to be completed for every supported language  
to make the system complete.

>> The interesting part I guess would be to see what languages fell  
>> outside this abstraction and needed further tailoring or its own  
>> approach. However, lettered enumerations seem fundamentally  
>> different than the Roman numeral system (and it sound like also the  
>> Armenian numeral system), but I imagine that both Armenian (as for  
>> Latin) would  enjoy also lettered enumerations. Perhaps I"m the  
>> only one confusing that here, but I'm having trouble following then.
>
> In a summary, we are juggling with 4 things:
>
> 	Alphabet: The particualr alphabet in question
> 	A: classic systems (Roman, Armenian, classical Greek, classical  
> Church slavonic, Georgian) based on letters in place of numbers.
> 	B: alphabetical collation
> 	C: hybrids: saying a, b, c instead of 1, 2, 3

OK, I think we're on the same page then: especially in terms of the A  
vs. B/C distinction. And if I understand correctly much of the  
discussion of Armenian (and mention of Ethiopic) has focussed on the  
A. classic systems. I'm less clear about the distinction between B and  
C.

> In many cases B and C are identical.  E.g. we could classify "upper- 
> norwegian" as both B and C. And some times even A and B may also  
> overlap. E.g. classical Armenian, for the 10 first letters.
>
> For some latin alphabets, however, C and B differs, because C  
> constitutes a list which is either larger or shorter than B. E.g. if  
> someone uses the A-Z list for C, even if their own alphabet is  
> shorter than that, then they are using A-Z purely as a counting  
> system - without any regard to the letters in their own language.

I would think here B and C are simply the same and that this author in  
using A-Z is really using latn-en-upper and not latn-??-upper,  
where ?? is their own language code.

> E.g for the Russian Cyrillic alphabet, C constitutes as shorter  
> version of that alphabet, than B does. {And B), in turn, constitutes  
> as shorter version of the Russian Cyrillic alphabet.}

I'm still not understanding the B/C distinction. I think the B and C  
distinction then is more about this limiting of subsets to specific  
languages.  Of course there will be some scripts where the language  
and script are reduced to the same thing (like Hebrew) and also where  
there is no upper and lower case distinction (e.g., Arabic, and also  
Hebrew again).

In the case of Armenian being identical for the first 10 letter, that  
may be true, but that's just a particular coincidence. The two systems  
are still fundamentally different between Armenian numeric (A) on on  
hand and Armenian alphabetic (B/C) on the other.

So I would say we're really dealing with two situations: letter-based  
numeral systems and alphabetic (more broadly in Unicode terminology:  
lettered) enumerations.

Take care,
Rob

[1]: <http://www.unicode.org/reports/tr10/>
[2]: <http://www.unicode.org/cldr/> : this includes a generalized  
collation for all of Unicode and then provides language-specific  
exceptions for collation. The exceptions do not limit the characters  
to the language in question, but only alter the collation of all of  
Unicode characters for the character collation relevant to that  
language.
Received on Saturday, 14 February 2009 22:44:07 UTC