Re: Handling Aboriginal Languages

Hi Brian,


In terms of creating web content in lesser used languages, I would 
recommend the use of UTF-8 rather than a custom legacy encoding. Most 
major web browsers have used Unicode for quite a long time, even before 
operating systems made the transition to Unicode.


That said, there are some specifics to the Dogrib language that pay 
special attention to.


Input is relatively trivial, it is easier to generate a keyboard layout 
suitable for Dogrib, but I'd be surprised if you'd have to input the 
text yourself. I'd assume you'd receive the text as a Word file or a 
text file. You may receive it in a custom legacy encoding or as Unicode. 
It is possible to create mapping files and convert between a custom 
legacy encoding and Unicode. We routinely do this if a number of African 
languages.


You will need to identify your target audience and their web usage 
patterns. It would be useful to know what the most common browsers and 
operating systems as  this will impact on display of text.


A couple of points.


Dogrib uses an ogonek to indicate a nasalised vowel. It uses a grave to 
indicate a low tone. In Unicode Normalization Form C, this would mean 
that a nasalised low tone vowel would be indicated ay a vowel with an 
ogonek and a combining grave. Assuming the typical user is using 
Internet Explorer on a Windows platform:


  * you require a font with either GPOS or GSUB tables that support the 
character combinations you require for use with combining diacritics) - 
this excludes all core fonts on Windows XP and earlier. But will include 
core fonts on Windows Vista.

  * you require a version of Uniscribe that supports the GSUB and GPOS 
features you need with the Latin script. In practice this means that 
Dogrib text will display correctly on Windows XP SP2, Windows XP SP3 and 
Windows Vista but will be problematic on older versions of Windows.


There are alternative approaches, but like graphite enabled Firefox 
installations and Graphite fonts. But that is getting to obscure for the 
average user.


Care needs to be taken in the CSS rules using font and font-family 
properties. Best leave out or put core Windows fonts last in a list of 
appropriate fonts. Declaring generic font family is pointless and may be 
harmful in certain circumstances..


Some characters like the glotal and combining diacritics (with 
appropriate OpenType support) will not be in many fonts.


And comparing:

  * http://www.languagegeek.com/dene/tlicho/tlicho.html

  * http://en.wikipedia.org/wiki/Dogrib_language

  * http://www.tlicho.ca/gonaowo-ways/PDF/A_Dogrib_Dictionary.pdf


The Dogrib dictionary uses an alternative glyph for the i-ogonek, it 
uses a dotless i with ogonek. If this is the culturally normative 
version then the ideal font would need to use this version of i-ogonek.


But I have insufficient data on hand to know if this is a typographic 
variation or cultural preference.


Andrew






Brian Cassidy wrote:

> Hello All,
>
> As a web developer in Canada, I've had to deal with both of our
> official languages: French and English. Today I've been given a new
> challenge as one of our clients wants to develop a site in some
> Aboriginal languages (Tlicho [1] for e.g.).
>
> Now, traditionally I just do everything in utf-8 and send that across
> the wire. However, with this language, are there even unicode
> codepoints for it? If so, how would i do the data entry? There are
> fonts available for the language so i could "cheat" and go that route
> as well.
>
> Does anyone have any advice on what direction I should follow?
>
> Thanks in advance,
>
> -Brian Cassidy (brian.cassidy@gmail.com)
>
> [1] http://www.tlicho.ca/
>
>   

-- 
Andrew Cunningham
Senior Manager, Research and Development
Vicnet
State Library of Victoria
328 Swanston Street
Melbourne VIC 3000

Ph: +61-3-8664-7430
Fax: +61-3-9639-2175

Email: andrewc@vicnet.net.au
Alt email: lang.support@gmail.com

http://home.vicnet.net.au/~andrewc/
http://www.openroad.net.au
http://www.vicnet.net.au
http://www.slv.vic.gov.au

Received on Thursday, 15 January 2009 03:16:15 UTC