- From: Andrew Cunningham <andrewc@mail.vicnet.net.au>
- Date: Thu, 19 Dec 2002 10:28:44 +1100
- To: public-i18n-geo@w3.org
WRT today's teleconference: What follows is very very brief. If people want more details or have specific questions, please let me know. African languages fall into four categories: 1) languages supported by unicode. E.g. Hausa and Pulaar (using Latin script). 2) languages supported by unicode, but require additional support in rendering systems. E.g. Yoruba, Ife, Dinka, Nuer, etc. This can include correct placement of combining diacritics based on languages' typographic conventions, or stacking of combining diacritics. Ife offers a challenging example. Some notes under construction that may illustrate some of the issues: http://www.openroad.net.au/languages/african/ife-2.html http://www.openroad.net.au/languages/african/dinka-4.html This is an issue for font rendering technologies (AAT/ATSUI, Uniscribe and Graphite for example). OpenType has features (e.g. MarkToBase, MarkToMark) that are designed for correct positioning of combining diacritics. Support for this in Uniscribe is currently under development. (Not sure of the status of AAT/ATSUI in this regard). In some cases: (Dinka and Nuer for instance) the existing combining diacritics for some fonts are adequate for lowercase characters (but not optimal), although entirely unsuitable for uppercase characters. In other cases like Ife, where diacritic stacking is required, it is a crucial concern which will be alleviated when the new versions of the font rendering technologies become widespread. Additionally, African languages use alternative glyphs for certain characters (most common example is uppercase ENG). It is possible to create alternative glyphs for different languages/typographic traditions within an opentype font. Unfortunately current software is unable to interact sufficiently with the font rendering systems to allow use of langauge specific features within fonts. At least thats my current understanding. 3) languages that have some characters that are not present in Unicode. E.g. Dagera (Burkina Faso), Hausa/Pulaar/etc. in Ajami (Arabic script). There has been a fair amount of discussion recently on Ajami on the Unicode-Afrique, A12N Collaboration and H-Hausa mailing lists. 4) scripts currently not supported by Unicode. E.g N'ko, Vai, Tifinagh, etc. With respect to HTML, issues are how to identify languages when there is no ISO-639-1 code or IANA language code. How should the "x-" convention be used in practical settings? For an example: http://home.vicnet.net.au/~andrewc/samples/nuer.htm I've use a convention "x-sil-" to indicate an ethnologue language codes. Although thats neither here nore there. Other key issues include charset identification in the absence of "defined" character encodings. A useful starting point is the "A12N gateway" http://www.bisharat.net/A12N/ Andrew Andrew Cunningham Multilingual Technical Officer OPT, Vicnet, State Library of Victoria Australia andrewc@vicnet.net.au
Received on Wednesday, 18 December 2002 18:28:35 UTC