- From: Robert J Burns <rob@robburns.com>
- Date: Fri, 13 Feb 2009 17:25:12 -0600
- To: W3C Style List <www-style@w3.org>
- Cc: Leif Halvard Silli <lhs@malform.no>, fantasai <fantasai.lists@inkedblade.net>
Hi Leif and fantasai, Leif wrote: > fantasai 2009-02-13 20.32: >> Aryeh Gregor wrote: >> >>> >>> Also, pragmatically, it would be very cumbersome to add >>> enumeration of >>> all an alphabet's letters for every language people can think up. >>> You'd have to have a different list-style-type for most languages >>> -- >>> even Latin-based alphabets differ on what they think the exact >>> set of >>> letters is, and what their order is. It seems like this would >>> greatly >>> bloat the spec. >>> >> Yeah, I think if we're going down that route we should define >> keywords >> for the most commonly-used alphabetic orders, and introduce a >> functional >> notation for everything else. How often do we need, e.g. upper- >> norwegian, >> given that lists are usually less than 26 letters? >> >> alpha("a-z") >> alpha("a-f,q-z") >> alpha("do,re,mi,fa,so,la,ti") >> >>> >> > Do you use 'alpha' for "latin alphabet"? Or could alpha be used > for Cyrillic as well? If you are taking your pattern from the way > RegEx/GREP is working, then remember that e.g. \p{Armenian} > matches any character in the Armenian block.[1] > > Hence e.g. > alpha(armenian) > could also be useful. >> >> Unicode can fill in ranges, so unless there are a lot of scripts >> like >> Ethiopic, where every language seems to have picked its own order >> for >> the letters, this doesn't have to be that painful. >> >> > Let's take one example: Slovak alphabet, about which Wikipedia > says: "The lexicographic ordering of the Slovak alphabet is very > similar to that of English": [2] > > alpha("a-d,dz,e-h,ch,i-z) > > And there are several such alphabets.[3] It can be complicated. > But on the whole, what you propose here would be very good to > have. I would much rather see this implemented accross UAs than > e.g. "upper-norwegian". (Although I also hope that we can get more > good keywords.) > > Btw, why did you pick "alpha"? Why not "numb"? Or do you think > that e.g. pure symbols should be excluded or have another name? I think Unicode provides a lot of useful abstractions for putting something like this together. However, I'm not clear on how you Leif are using number and alpha here either. My understanding is that alpha, or using letters as an enumeration system, is not treating them as numbers (though perhaps loosely since they're enumerating), but still as letters. From what I can tell Armenian is however using letters as numerals as Roman numerals do). And since Unicode uses "number" to categorize specifically graphemes used as a numerals (7, 0, ↀ), I think that is a useful distinction to follow. Also since Unicode provides language-specific (not merely script-specific) collations, I don't even think the Ethiopic case should be particularly troublesome here. In terms of Unicode abstractions I think what we're looking for is: 1) the designation of a script (e.g., Latin or Ethiopic or Armenian) and the letters in that script (general category, "Lu", "Ll", "Lo" (leaving out "Lm" and "Lt" since those are not really of interest here). 2) the more focussed designation of a language which would limit the script to specific letters (through CSS provided criteria) and also provide a collation from the Unicode collation algorithm collations (the Unicode collation alone includes other characters in the set not just letters or letters specific to that language). So that means all CSS needs to do is limit the specific letters used in an alphabetic (or more precisely in Unicode terms a lettered) enumeration. Unicode provides the rest. However as Leif suggested before some general naming scheme might be needed to allow some way to express this in a functional form as fantasai suggested (something to serve as the argument/arguments for an "alpha" or "lettered" function). So for example lettered(latn-no) or alpha(latn-no) could indicate the Latin script limited to Norwegian letters and sorted according to the Unicode Norwegian collation. The same thing could be accomplished for any Ethiopic based langauge (unless there's something else I'm missing there). The interesting part I guess would be to see what languages fell outside this abstraction and needed further tailoring or its own approach. However, lettered enumerations seem fundamentally different than the Roman numeral system (and it sound like also the Armenian numeral system), but I imagine that both Armenian (as for Latin) would enjoy also lettered enumerations. Perhaps I"m the only one confusing that here, but I'm having trouble following then. Take care, Rob
Received on Friday, 13 February 2009 23:25:51 UTC