- From: Robert J Burns <rob@robburns.com>
- Date: Sat, 14 Feb 2009 16:43:25 -0600
- To: Leif Halvard Silli <lhs@malform.no>
- Cc: W3C Style List <www-style@w3.org>, fantasai <fantasai.lists@inkedblade.net>
Hi Leif, On Feb 14, 2009, at 3:47 PM, Leif Halvard Silli wrote: > Robert J Burns 2009-02-14 00.25: >> Hi Leif and fantasai, > >> So that means all CSS needs to do is limit the specific letters >> used in an alphabetic (or more precisely in Unicode terms a >> lettered) enumeration. Unicode provides the rest. However as Leif >> suggested before >> some general naming scheme might be needed to allow some way to >> express this in a functional form as fantasai suggested (something >> to serve as the argument/arguments for an "alpha" or "lettered" >> function). So for example lettered(latn-no) or alpha(latn-no) could >> indicate the Latin script limited to Norwegian letters and sorted >> according to the Unicode Norwegian collation. > > Sounds fine with me with alpha(latn-no). But I don't know if upper- > case/lower-case fits in in such a system. Yes, it occurred to me I forgot that. I think simply inserting upper or lower in there would make sense. so alpha(latn-no-upper) or alpha(latn-no-lower) would work. >> The same thing could be accomplished for any Ethiopic based >> langauge (unless there's something else I'm missing there). > > If there is one thing that should be relatively easy to document and > agree about, then it is about alphabetical collation. Especially so > if such collation is documented in Unicode - I am not familiar with > that subject. I was referring to the Unicode Collation Algorithm[1] and the associated Unicode Common Locale Data Repository[2]. However, I think the part that hasn't been done by Unicode is documenting the subset of characters within each script relevant to a particular language using that script. There are also many letters within each script that are specialized letters that probably don't belong in an lettered enumeration. So much as CSS3 has already been doing for the list module, that would need to be completed for every supported language to make the system complete. >> The interesting part I guess would be to see what languages fell >> outside this abstraction and needed further tailoring or its own >> approach. However, lettered enumerations seem fundamentally >> different than the Roman numeral system (and it sound like also the >> Armenian numeral system), but I imagine that both Armenian (as for >> Latin) would enjoy also lettered enumerations. Perhaps I"m the >> only one confusing that here, but I'm having trouble following then. > > In a summary, we are juggling with 4 things: > > Alphabet: The particualr alphabet in question > A: classic systems (Roman, Armenian, classical Greek, classical > Church slavonic, Georgian) based on letters in place of numbers. > B: alphabetical collation > C: hybrids: saying a, b, c instead of 1, 2, 3 OK, I think we're on the same page then: especially in terms of the A vs. B/C distinction. And if I understand correctly much of the discussion of Armenian (and mention of Ethiopic) has focussed on the A. classic systems. I'm less clear about the distinction between B and C. > In many cases B and C are identical. E.g. we could classify "upper- > norwegian" as both B and C. And some times even A and B may also > overlap. E.g. classical Armenian, for the 10 first letters. > > For some latin alphabets, however, C and B differs, because C > constitutes a list which is either larger or shorter than B. E.g. if > someone uses the A-Z list for C, even if their own alphabet is > shorter than that, then they are using A-Z purely as a counting > system - without any regard to the letters in their own language. I would think here B and C are simply the same and that this author in using A-Z is really using latn-en-upper and not latn-??-upper, where ?? is their own language code. > E.g for the Russian Cyrillic alphabet, C constitutes as shorter > version of that alphabet, than B does. {And B), in turn, constitutes > as shorter version of the Russian Cyrillic alphabet.} I'm still not understanding the B/C distinction. I think the B and C distinction then is more about this limiting of subsets to specific languages. Of course there will be some scripts where the language and script are reduced to the same thing (like Hebrew) and also where there is no upper and lower case distinction (e.g., Arabic, and also Hebrew again). In the case of Armenian being identical for the first 10 letter, that may be true, but that's just a particular coincidence. The two systems are still fundamentally different between Armenian numeric (A) on on hand and Armenian alphabetic (B/C) on the other. So I would say we're really dealing with two situations: letter-based numeral systems and alphabetic (more broadly in Unicode terminology: lettered) enumerations. Take care, Rob [1]: <http://www.unicode.org/reports/tr10/> [2]: <http://www.unicode.org/cldr/> : this includes a generalized collation for all of Unicode and then provides language-specific exceptions for collation. The exceptions do not limit the characters to the language in question, but only alter the collation of all of Unicode characters for the character collation relevant to that language.
Received on Saturday, 14 February 2009 22:44:07 UTC