FW: PAN Localization Project Phase 1 Output from Richard Ishida on 2007-09-04 (www-international@w3.org from July to September 2007)

From: Richard Ishida <ishida@w3.org>
Date: Tue, 4 Sep 2007 12:11:49 +0100
To: "'WWW International'" <www-international@w3.org>
Message-ID: <001d01c7eee4$64605ee0$6501a8c0@rishida>
FYI



From: Sarmad Hussain [mailto:sarmad.hussain@nu.edu.pk] 
Sent: 03 September 2007 06:22
To: s-asia-it@lists.apnic.net
Subject: PAN Localization Project Phase 1 Output



PAN LOCALIZATON PROJECT 


(www.PANL10n.net <http://www.panl10n.net/> ) 


A Regional Initiative to Develop Local Language Computing Capacity in Asia


Announcement of Phase 1 Outputs


(Software: http://www.panl10n.net/english/outputChart.htm) 

(Training and Workshops: http://www.panl10n.net/english/activities.htm) 

(About Phase 1: http://www.panl10n.net/english/phase1(main).htm) 

 

PAN Localization project is pleased to announce the release of its outputs, after the completion of its first Phase, from 2004 to 2007.  The project has been a partnership of South and South-East Asia to build capacity, technology and policy for local language computing.  The countries (and languages) included in the Project were Afghanistan (Pashto), Bangladesh (Bangla), Bhutan (Dzongkha), Cambodia (Khmer), Laos (Lao), Nepal (Nepali) and Sri Lanka (Sinhala, Tamil).  

 

The project carried out an extensive training program to raise capacity to develop language technology, conducting national and regional short-term and long term programs across all partner countries.  Training has been imparted in linguistics, standards development, open source software localization, speech processing, script processing and computational linguistics.  Details of training conducted, training programs and training material is also published at the project website (under the Activities link).  The first phase has also built an Asian network of researchers to share knowledge in language computing. The project has been (and is continually) publishing research reports, and documenting effective processes, results and recommendations.  `

  

In Bangladesh, Bangla lexicon, spell-checking and sorting software have been developed in addition to Bangla Optical Character Reader, which will contribute to the computing needs of more than 200 million Bangla speakers around the world.  These applications are being integrated into BanglaPad software, which is open source and can run on multiple computing platforms.  

 

In Bhutan Dzongkha Linux Distribution has been developed and released, along with the associated language technology standards.  This distribution enables users to do word processing, chatting, web browsing, emailing, multimedia accessing, CD burning and a host of other applications in their national language. In addition, 500 keyboards have been manufactured on Dzongkha keyboard standard developed through the project and distributed to promote the use of Dzongkha Linux Distribution.  

 

Software has been developed to compose Khmer text as well.  This software, which allows sorting data according to the Choun Nath dictionary published by Government of Cambodia, does automatic word segmentation of Khmer text and checks the Khmer text for spelling errors.  The work done on Khmer further includes Unicode compatible fonts, utilities to convert most non-Unicode fonts to Unicode, a Khmer text corpus and a Khmer lexicon.  

 

For Lao language, fonts, keyboard layout, word-segmentation and lexicon have been developed and integrated into an end-user application.  Lao Optical Character Reader is also being developed to automatically scan Lao printed material and automatically convert it into editable text to accelerate online content generation.  

 

A complete Nepali Linux Distribution has also been developed.  The distribution contains word-processing, spread sheet, presentation, chatting, web browsing, accounting and other software, all enabled completely in Nepali, with Nepali help files.  Further work is in progress on Nepali lexicon, spell checker, grammar checker and localization of mobile platform.  

 

Text-to-Speech (TTS) and Optical Character Recognition systems for Sinhala have been developed in Sri Lanka and are currently being tested and improved for end-user deployment.  Some initial work is also being done to use TTS technology for giving access to blind community through an open source screen reader.  In addition, a comprehensive Sinhala corpus and Sinhala-Tamil-English dictionary is being developed.  

 

In Afghanistan, encoding, keyboard and collation standards have been researched and developed for consideration and ratification of the government.  More is planned, even with challenges of availability and retention of qualified resources.  

 

The project will continue beyond the completion of the first phase. Phase II of PAN Localization project will research into challenges associated with digital literacy of end-users using the localized technology for communication and to produce local language content.  The project will also continue to further mature the language technology in the target languages. Three more countries (and languages) are included in the second phase of the project: China (Tibetan), Mongolia (Mongolian), and Pakistan (Urdu).

 

 

 

The project is being funded by

 

The International Development Research Program (IDRC), Canada, through its Pan Asia Networking (PAN) program

 

and coordinated by 

 

National University of Computer and Emerging Sciences, Pakistan (NUCES) through its Centre for Research in Urdu Language Processing (CRULP)

 

 

(Apologies for duplicate postings).

 

Best regards,

 

Prof. Sarmad Hussain

Center for Research in Urdu Language Processing

National University of Computer and Emerging Sciences

B Block, Faisal Town

Lahore, PAKISTAN

Ph: (+9242) 111 128 128 ext. 241

Fax: (+9242) 516 5232

URL: www.crulp.org    www.nu.edu.pk
Received on Tuesday, 4 September 2007 11:09:42 UTC