W3C home > Mailing lists > Public > public-i18n-its@w3.org > April to June 2005

RE: Describing other cultural aspects of the content

From: Masaki Itagaki <masaki_itagaki@aliquantuminc.com>
Date: Fri, 15 Apr 2005 23:25:46 -0600
To: <public-i18n-its@w3.org>
Message-ID: <133.16701.1120552808@automsgid.listhub.w3.org>
Hi Najib


Thank you for your additional information of Arabic. Those are pretty interesting (especially the fact that “April” is pronounced as “Nissan”….really interesting!). A quick question is if there is any variation WITHIN the same locale. For example, ABRIL and NISSAN -- they can be differentiate by locales, or ar-MA and ar-EG, can’t they? I imagine that if you set your system locale to ar-MA and ar-EG, let’s say, a calendar application (if it’s appropriately designed from the i18n point of view…) shows “ABRIL” and “NISSAN” accordingly. I might well be wrong, but if that’s the case, the variation can be handled by existing locales. However, if you have other variations within one locale (like ar_MA), that could be the issue that this requirement is trying to manage. If you had such a example, I would like to include it in the requirement. Japanese is a good example in this case since there is only one locale for Japanese (ja_JP), but obviously there are many cultural variants…. Anyway, I appreciate your comments.





From: public-i18n-its-request@w3.org [mailto:public-i18n-its-request@w3.org] On Behalf Of Najib Tounsi
Sent: Friday, April 15, 2005 1:58 PM
To: masaki_itagaki@aliquantuminc.com
Cc: public-i18n-its@w3.org
Subject: Re: Describing other cultural aspects of the content


Masaki Itagaki wrote: 

Basically this requirement is the same as the one in the original draft, but I added the issue of writing styles as I discussed in the MLs and conference calls. As to the core portion of this requirement, it's highly likely that I missed something or am getting something wrong. Please post any comments on this.





It must be possible to declare more information about content than a language/locale for better text parsing and content reusability. Aspects that require finer granularity of content specification may include script usages, geographical areas, dialects or content context. The declaration of such an attribute should be done at the beginning of a document. Any content within a document which varies from the primary declaration should be labeled appropriately. 


Background :

In order to successfully and efficiently parse document content, there should be more information than a language or a locale. Examples of issues are: 

           A language/locale cannot perfectly represent orthography: e.g. “zh-CN” does not stipulate if it’s simplified or traditional Chinese. Locale for Yugoslavia does not provide guidance as to whether the language should be writeen in Latin or Cyrillic scripts. 

           Multiple cultural preferences within one locale: e.g. In Japanese (“ja-JP”), there are two official date formats – Japanese emperor date (Wareki) and a standard numeridateat a voice track is in the language spoken in German-speaking Switzerland rather than the language written there, since one is Schwytzertuutsch (Swiss Germen) and the other is very close to but not the same as 'High German'? How does one indicate that a piece of content is in 'International Spanish'? How does one indicate that this is English as spoken in the time of Chaucer?

           Different writing styles and tones in one language: e.g. Japanese uses a polite style (“Desu/masu tone”) for user guides and a formal style (“Da/dearu tone”) for academic and legal content. Italian uses an informal style for software help content and a formal style for user guides.  

Just another example: Diffrence in word or character. 
- In Arabic, most middle-east countries use indic-digits while western countries (e.g. north Africa) use latin digits.
- Months (..., mai, june ...) also have different names.  In morocco, for April, we write "ABRIL" pronouced abril, in middle-east they write "NISSAN" pronounced nissan. 
Here is an example from the newspaper  Asharq Alawsat http://www.aawsat.com/view/ 
where you have (from left to right)  "2005 (NISSAN) ABRIL..."
The less used month name NISSAN is between parenthesis . 

Translating from English to Arabic a translator should distinguish ar-MA from ar-EG.



Identifying these variations is very important especially for content reusability. For example, the same source-language content could be translated into two different target-language content units depending on context that leads to different writing styles (e.g. formal and informal in Italian). When the content is reused both in source and target languages, context information (such as whether the content is for a user guide or a user help) must be provided in order to reuse content with an appropriate writing style.   



Najib TOUNSI (mailto:tounsi@w3.org)
Bureau W3C au Maroc (http://www.w3c.org.ma/)
Ecole Mohammadia d'Ingenieurs, BP 765 Agdal-RABAT Maroc (Morocco)
Phone : +212 (0) 37 68 71 74  Fax : +212 (0) 37 77 88 53
Mobile: +212 (0) 61 22 00 30

(image/gif attachment: image001.gif)

Received on Saturday, 16 April 2005 05:27:09 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:43:04 UTC