Re: css3-lists: Call For Comments

Greetings,

Some much delayed comments on the Ethiopic list styles from the Nov-7 TR.
The below notes some corrections, answers questions in the TR, asks some
new questions and adds some data.



TR Corrections
==============

While proofreading all Ethiopic lists I discovered that U+1210 should not
be in the Sidama list, it should be removed in the next draft.

In all of the halehame lists U+1330 should precede U+1338.  They are
reversed in the sidama, tigre and oromo lists. The reversal *is* 
correct in Blin.

The Afar list should include U+12F8 after U+12F0.


Suffix
------

Concerning the suffix for Ethiopic lists, there is no strong preference.
I encounter U+002F more often than U+1366 (as used in the TR), but only
slightly more often. Hence, I don't think a strong preference for suffix
choice can be demonstrated from literature.  This isn't critical so long
as the CSS spec allows designers to set their suffix of choice.  I've used
U+002F in the algorithms below, feel free to change it.


Ethiopic-Numeric
----------------
Answering the boxed question in the TR, the best suffix for ethiopic-numeric
is U+1361.  In this case there is a strong suffix preference.

Otherwise a tweak is needed in the algorithm, all things considered the
simplest tweak would be to adjust step 3 to:

3. If the group has an odd number (as given in the previous step) and has the value 1, or if the group is the most significant one and has the value 1, or if the group has the value zero, then remove the digit (but leave the group if the group has an even number, so it still has a separator appended below).


which adds the condition "...if the group has an even number" for holding onto
a group.  I've walked thru this adjusted algorithm with the example numbers
(Ian remember 780000001092 was a typo for 780100000092) and it generates the
proper values.



Additional Ethiopic List Style Algorithms
=========================================

Concerning qualifications for the list styles that follow; they would be
as valid as non-ge'ez, amharic and tigrigna styles in the present draft
proposal.  Like the afar, oromo sidama, somali, and tigre styles in the present
draft the information on the following is garnered from ethiopic phonology
tables, literacy and orthography studies conducted over the last 20 years.
So while a literacy commission proposed an ethiopic character set for Afar,
for example, whether or not the Afar later adapted the character set as
shown I can not say with any certainty.  All of these groups, with no
doubt whatsoever, have used Ethiopic at one time or another, but plausibly
could have modified what was proposed in the referenced investigations.

References for the list styles can be found here:
http://www.ethiopic.org/Collation/OrderedLists.html

The blin style has been verified for list context by a Blin standards
groups.  The Agaw, Harari, Me'en, and Silti list styles rely on 
references or direct communication with regional government offices
that is only a few years old.

Omitted in the following are list styles for Bench and Sebatbeit.  These
languages require characters not yet in Unicode and will likely not be
before version 6.0.  They could however be supported now using a subset
from available Unicode characters, but some note would have to be made
that the list styles are subject to a later revision.  What would policy
dictate here?

agaw is defined as an alphabetic system (numeric repeating with no
insignificant 0 value) defined for all positive numbers greater than zero,
using codepoints U+1200, U+1208, U+1210, U+1218, U+1228, U+1230, U+1238,
U+1240, U+1250, U+1260, U+1268, U+1270, U+1278, U+1290, U+1298, U+12A0,
U+12A8, U+12B8, U+12C8, U+12D0, U+12D8, U+12E0, U+12E8, U+12F0, U+1300,
U+1308, U+1318, U+1320, U+1328, U+1330, U+1338, U+1348, U+1350,
with a base of 31, a suffix of U+002F, and no exceptions.

ari is defined as an alphabetic system (numeric repeating with no
insignificant 0 value) defined for all positive numbers greater than zero,
using codepoints U+1200, U+1208, U+1218, U+1228, U+1230, U+1238, U+1260,
U+1268, U+1270, U+1278, U+1290, U+12A0, U+12A8, U+12B8, U+12C8, U+12D0,
U+12D8, U+12E0, U+12E8, U+12F0, U+12F8, U+1300, U+1308, U+1328, U+1340,
U+1350, with a base of 26, a suffix of U+002F, and no exceptions.

blin is defined as an alphabetic system (numeric repeating with no
insignificant 0 value) defined for all positive numbers greater than zero,
using codepoints U+1200, U+1208, U+1210, U+1218, U+1230, U+1238, U+1228,
U+1240, U+1250, U+1260, U+1270, U+1290, U+12A0, U+12A8, U+12B8, U+12C8,
U+12D0, U+12E8, U+12F0, U+1300, U+1308, U+1318, U+1320, U+1328, U+1348,
U+12D8, U+12E0, U+1278, U+1298, U+1338, U+1330, U+1350, U+1268 with a base
of 33, a suffix of U+002F, and no exceptions.

dizi is defined as an alphabetic system (numeric repeating with no
insignificant 0 value) defined for all positive numbers greater than zero,
using codepoints U+1200, U+1208, U+1218, U+1228, U+1230, U+1238, U+1240,
U+1260, U+1270, U+1278, U+1290, U+1298, U+12A0, U+12A8, U+12C8, U+12D8,
U+12E0, U+12E8, U+12F0, U+1300, U+1308, U+1320, U+1328, U+1338, U+1340,
U+1348, with a base of 26, a suffix of U+002F, and no exceptions.

gedeo is defined as an alphabetic system (numeric repeating with no
insignificant 0 value) defined for all positive numbers greater than zero,
using codepoints U+1200, U+1208, U+1218, U+1228, U+1230, U+1238, U+1240,
U+1260, U+1270, U+1278, U+1290, U+12A0, U+12A8, U+12C8, U+12E8, U+12F0,
U+1300, U+1308, U+1320, U+1328, U+1330, U+1338, U+1348, U+1350, with a base
of 24, a suffix of U+002F, and no exceptions.

gumuz is defined as an alphabetic system (numeric repeating with no
insignificant 0 value) defined for all positive numbers greater than zero,
using codepoints U+1200, U+1210, U+1208, U+1210, U+1218, U+1228, U+1230,
U+1238, U+1240, U+1260, U+1268, U+1270, U+1278, U+1290, U+1298, U+12A0,
U+12A8, U+12C8, U+12D0, U+12D8, U+12E0, U+12E8, U+12F0, U+12F8, U+1308,
U+1328, U+1330, U+1340, U+1350, with a base of 29, a suffix of U+002F, and
no exceptions.

hadiyya is defined as an alphabetic system (numeric repeating with no
insignificant 0 value) defined for all positive numbers greater than zero,
using codepoints U+1200, U+1208, U+1218, U+1228, U+1230, U+1238, U+1240,
U+1260, U+1270, U+1278, U+1290, U+12A0, U+12A8, U+12C8, U+12D8, U+12E8,
U+12F0, U+1300, U+1308, U+1320, U+1328, U+1330, U+1348, U+1350, with a base
of 24, a suffix of U+002F, and no exceptions.

harari is defined as an alphabetic system (numeric repeating with no
insignificant 0 value) defined for all positive numbers greater than zero,
using codepoints U+1210, U+1208, U+1218, U+1228, U+1230, U+1238, U+1240,
U+1260, U+1270, U+1278, U+1290, U+1298, U+12A0, U+12A8, U+12B8, U+12C8,
U+12E0, U+12E8, U+12F0, U+1300, U+1308, U+1320, U+1328, U+1348, with a base
of 24, a suffix of U+002F, and no exceptions.

kaffa is defined as an alphabetic system (numeric repeating with no
insignificant 0 value) defined for all positive numbers greater than zero,
using codepoints U+1200, U+1208, U+1210, U+1218, U+1220, U+1228, U+1230,
U+1238, U+1240, U+1260, U+1270, U+1278, U+1280, U+1290, U+12A0, U+12A8,
U+12C8, U+12D0, U+12E8, U+12F0, U+1300, U+1308, U+1320, U+1328, U+1330,
U+1348, U+1350, with a base of 27, a suffix of U+002F, and no exceptions.

kebena is defined as an alphabetic system (numeric repeating with no
insignificant 0 value) defined for all positive numbers greater than zero,
using codepoints U+1200, U+1208, U+1218, U+1228, U+1230, U+1238, U+1240,
U+1260, U+1270, U+1278, U+1290, U+12A0, U+12A8, U+12C8, U+12D0, U+12D8,
U+12E0, U+12E8, U+12F0, U+1300, U+1308, U+1320, U+1328, U+1330, U+1348,
U+1350, with a base of 26, a suffix of U+002F, and no exceptions.

kembata is defined as an alphabetic system (numeric repeating with no
insignificant 0 value) defined for all positive numbers greater than zero,
using codepoints U+1200, U+1208, U+1218, U+1228, U+1230, U+1238, U+1240,
U+1260, U+1268, U+1270, U+1278, U+1290, U+12A0, U+12A8, U+12C8, U+12D8,
U+12E8, U+12F0, U+1300, U+1308, U+1320, U+1328, U+1330, U+1348, U+1350
with a base of 25, a suffix of U+002F, and no exceptions.

konso is defined as an alphabetic system (numeric repeating with no
insignificant 0 value) defined for all positive numbers greater than zero,
using codepoints U+1200, U+1208, U+1218, U+1228, U+1230, U+1238, U+1240,
U+1260, U+1270, U+1278, U+1290, U+1298, U+12A0, U+12A8, U+12B8, U+12C8,
U+12D0, U+12E8, U+12F0, U+1300, U+1348, U+1350, with a base of 22, a suffix
of U+002F, and no exceptions.

kunama is defined as an alphabetic system (numeric repeating with no
insignificant 0 value) defined for all positive numbers greater than zero,
using codepoints U+1200, U+1208, U+1218, U+1228, U+1230, U+1238, U+1260,
U+1270, U+1278, U+1290, U+1298, U+12A0, U+12A8, U+12B8, U+12C8, U+12E8,
U+12F0, U+1300, U+1308, U+1348 with a base of 20, a suffix of U+002F, and
no exceptions.

meen is defined as an alphabetic system (numeric repeating with no
insignificant 0 value) defined for all positive numbers greater than zero,
using codepoints U+1200, U+1208, U+1218, U+1228, U+1230, U+1238, U+1240,
U+1260, U+1270, U+1278, U+1280, U+1290, U+1298, U+12A0, U+12A8, U+12C8,
U+12D8, U+12E8, U+12F0, U+1300, U+1308, U+1320, U+1328, U+1330, U+1350,
U+12F8, U+1340, with a base of 27, a suffix of U+002F, and no exceptions.

saho is defined as an alphabetic system (numeric repeating with no
insignificant 0 value) defined for all positive numbers greater than zero,
using codepoints U+1200, U+1208, U+1210, U+1218, U+1228, U+1230, U+1240,
U+1260, U+1270, U+1290, U+12A0, U+12A8, U+12C8, U+12D0, U+12D8, U+12E8,
U+12F0, U+1308, U+1320, U+1328, U+1330, U+1338, U+1348, with a base of
23, a suffix of U+002F, and no exceptions.

silti is defined as an alphabetic system (numeric repeating with no
insignificant 0 value) defined for all positive numbers greater than zero,
using codepoints U+1200, U+1208, U+1218, U+1228, U+1230, U+1238, U+1240,
U+1260, U+1270, U+1278, U+1290, U+1298, U+12A0, U+12A8, U+12B8, U+12C8,
U+12D8, U+12E0, U+12E8, U+12F0, U+1300, U+1308, U+1320, U+1328, U+1330,
U+1348, U+1350 with a base of 27, a suffix of U+002F, and no exceptions.

wolaita is defined as an alphabetic system (numeric repeating with no
insignificant 0 value) defined for all positive numbers greater than zero,
using codepoints U+1200, U+1208, U+1218, U+1228, U+1230, U+1238, U+1240,
U+1260, U+1270, U+1278, U+1290, U+1298, U+12A0, U+12A8, U+12C8, U+12D8,
U+12E0, U+12E8, U+12F0, U+12F8, U+1230, U+1308, U+1320, U+1328, U+1330,
U+1338, U+1340, U+1348, U+1350, with a base of 29, a suffix of U+002F, and
no exceptions.

yemsa is defined as an alphabetic system (numeric repeating with no
insignificant 0 value) defined for all positive numbers greater than zero,
using codepoints U+1200, U+1208, U+1218, U+1228, U+1230, U+1238, U+1240,
U+1260, U+1268, U+1270, U+1278, U+1290, U+1298, U+12A0, U+12A8, U+12C8,
U+12D8, U+12E0, U+12E8, U+12F0, U+1300, U+1308, U+1318, U+1320, U+1328,
U+1330, U+1348, U+1350, with a base of 28, a suffix of U+002F, and no
exceptions.


Non-Ethiopic Ethiopian List Styles
==================================

Qubee List Styles
-----------------

Qubee is a writing system based on the Roman alphabet used by the Oromo of
Ethiopia in the regional government, legal and schools systems.  Qubee dates
back to the 70s but official use did not begin until the last change in the
national government at the start of the 90s.  The Qubee Alphabet (also the
collation order):

A, AA, B, C, D, E, EE, F, G, H, I, II,  J,  K,  L,  M,  N, O, OO,
P,  Q, R, S, T, U, UU, V, W, X, Y,  Z, CH, DH, KH, NY, PH, SH

"V" and "Z" are kept for non-Oromo transcriptions.  "KH" after "DH" is also
added for transcriptions from Arabic.

A complication appears when a list length exceeds the base size.  For example,
a value of 38 for upper-oromo-qubee would be "AA" which, out of context, could
be mistaken for the list item value of 2.  There is no existing rule for how
to avoid this conflict.  Suggested solutions are to either do nothing and
rely on the context (assume list values do not appear in isolation away from
the list) or to add a space or punctuation as a delimiter.  Relying on the
list context is prefered and would be easiest to implement, it requires no
special treatment, otherwise a comma (U+002C, ",") is the suggested cycle
delimiter.


lower-oromo-qubee is defined as an alphabetic system (numeric repeating
with no insignificant 0 value) defined for all positive numbers greater than
zero, using codepoints U+0041, U+0041U+0041, U+0042, U+0043, U+0044, U+0045,
U+0045U+0045, U+0046, U+0047, U+0048, U+0049, U+0049U+0049, U+004A, U+004B,
U+004C, U+004D, U+004E, U+004F, U+004FU+004F, U+0050, U+0051, U+0052, U+0053,
U+0054, U+0055, U+0055U+0055, U+0056, U+0057, U+0058, U+0059, U+005A, 
U+0043U+0048, U+0044U+0048, U+004BU+0048, U+004EU+0059, U+0050U+0048,
U+0053U+0048, with a base of 37, a suffix of U+002E, and no exceptions.

upper-oromo-qubee is defined as an alphabetic system (numeric repeating
with no insignificant 0 value) defined for all positive numbers greater than
zero, using codepoints U+0061, U+0061U+0061, U+0062, U+0063, U+0064, U+0065,
U+0065U+0065, U+0066, U+0067, U+0068, U+0069, U+0069U+0069, U+006A, U+006B,
U+006C, U+006D, U+006E, U+006F, U+006FU+006F, U+0070, U+0071, U+0072, U+0073,
U+0074, U+0075, U+0075U+0075, U+0076, U+0077, U+0078, U+0079, U+007A,
U+0063U+0068, U+0064U+0068, U+006BU+0068 U+006EU+0079, U+0070U+0068,
U+0073U+0068, with a base of 37, a suffix of U+002E, and no exceptions.

Received on Friday, 6 December 2002 15:49:20 UTC