Re: Armenian numbering: findings, recommendations and request to CSS from Leif Halvard Silli on 2009-02-13 (www-international@w3.org from January to March 2009)

From: Leif Halvard Silli <lhs@malform.no>
Date: Fri, 13 Feb 2009 23:18:06 +0100
To: fantasai <fantasai.lists@inkedblade.net>
CC: www-style@w3.org, www-international@w3.org, Håkon Wium Lie <howcome@opera.com>
Message-ID: <4995F19E.1050706@malform.no>

fantasai 2009-02-13 20.32:
> Aryeh Gregor wrote:
>>
>> Also, pragmatically, it would be very cumbersome to add enumeration of
>> all an alphabet's letters for every language people can think up.
>> You'd have to have a different list-style-type for most languages --
>> even Latin-based alphabets differ on what they think the exact set of
>> letters is, and what their order is.  It seems like this would greatly
>> bloat the spec.
> 
> Yeah, I think if we're going down that route we should define keywords
> for the most commonly-used alphabetic orders, and introduce a functional
> notation for everything else. How often do we need, e.g. upper-norwegian,
> given that lists are usually less than 26 letters?
> 
> alpha("a-z")
> alpha("a-f,q-z")
> alpha("do,re,mi,fa,so,la,ti")

Do you use 'alpha' for "latin alphabet"? Or could alpha be used 
for Cyrillic as well? If you are taking your pattern from the way 
RegEx/GREP is working, then remember that e.g. \p{Armenian} 
matches any character in the Armenian block.[1]

Hence e.g.
   alpha(armenian)
could also be useful.

> Unicode can fill in ranges, so unless there are a lot of scripts like
> Ethiopic, where every language seems to have picked its own order for
> the letters, this doesn't have to be that painful.

Let's take one example: Slovak alphabet, about which Wikipedia 
says: "The lexicographic ordering of the Slovak alphabet is very 
similar to that of English": [2]

   alpha("a-d,dz,e-h,ch,i-z)

And there are several such alphabets.[3] It can be complicated. 
But on the whole, what you propose here would be very good to 
have. I would much rather see this implemented accross UAs than 
e.g. "upper-norwegian". (Although I also hope that we can get more 
good keywords.)

Btw, why did you pick "alpha"? Why not "numb"? Or do you think 
that e.g. pure symbols should be excluded or have another name?

[1] http://en.wikipedia.org/wiki/Regex#Regular_expressions_and_Unicode
[2] http://en.wikipedia.org/wiki/Slovak_alphabet
[3] 
http://en.wikipedia.org/wiki/Latin-derived_alphabet#Extended_Latin_alphabet
-- 
leif halvard silli

Received on Friday, 13 February 2009 22:18:48 UTC