General: African languages

WRT today's teleconference:

What follows is very very brief. If people want more details or have 
specific questions, please let me know.

African languages fall into four categories:

1) languages supported by unicode.
E.g. Hausa and Pulaar (using Latin script).

2) languages supported by unicode, but require additional support in 
rendering systems.

E.g. Yoruba, Ife, Dinka, Nuer, etc.

This can include correct placement of combining diacritics based on 
languages' typographic conventions, or stacking of combining diacritics. 
Ife offers a challenging example.

Some notes under construction that may illustrate some of the issues:

http://www.openroad.net.au/languages/african/ife-2.html
http://www.openroad.net.au/languages/african/dinka-4.html

This is an issue for font rendering technologies (AAT/ATSUI, Uniscribe 
and Graphite for example). OpenType has features (e.g. MarkToBase, 
MarkToMark) that are designed for correct positioning of combining 
diacritics. Support for this in Uniscribe is currently under 
development. (Not sure of the status of AAT/ATSUI in this regard).

In some cases: (Dinka and Nuer for instance) the existing combining 
diacritics for some fonts are adequate for lowercase characters (but not 
optimal), although entirely unsuitable for uppercase characters. In 
other cases like Ife, where diacritic stacking is required, it is a 
crucial concern which will be alleviated when the new versions of the 
font rendering technologies become widespread.

Additionally, African languages use alternative glyphs for certain 
characters (most common example is uppercase ENG). It is possible to 
create alternative glyphs for different languages/typographic traditions 
within an opentype font. Unfortunately current software is unable to 
interact sufficiently with the font rendering systems to allow use of 
langauge specific features within fonts.

At least thats my current understanding.

3) languages that have some characters that are not present in Unicode.
E.g. Dagera (Burkina Faso), Hausa/Pulaar/etc. in Ajami (Arabic script).

There has been a fair amount of discussion recently on Ajami on the 
Unicode-Afrique, A12N Collaboration and H-Hausa mailing lists.

4) scripts currently not supported by Unicode.
E.g  N'ko, Vai, Tifinagh, etc.

With respect to HTML, issues are how to identify languages when there is 
no ISO-639-1 code or IANA language code. How should the "x-" convention 
be used in practical settings?

For an example:

http://home.vicnet.net.au/~andrewc/samples/nuer.htm

I've use a convention "x-sil-" to indicate an ethnologue language codes. 
Although thats neither here nore there.

Other key issues include charset identification in the absence of 
"defined" character encodings.

A useful starting point is the "A12N gateway" http://www.bisharat.net/A12N/

Andrew

Andrew Cunningham
Multilingual Technical Officer
OPT, Vicnet,
State Library of Victoria
Australia

andrewc@vicnet.net.au

Received on Wednesday, 18 December 2002 18:28:35 UTC