W3C home > Mailing lists > Public > public-i18n-its@w3.org > April to June 2005

Re: Describing other cultural aspects of the content

From: Najib Tounsi <ntounsi@emi.ac.ma>
Date: Fri, 15 Apr 2005 19:57:59 +0000
Message-ID: <42601CC7.5040007@emi.ac.ma>
To: masaki_itagaki@aliquantuminc.com
CC: public-i18n-its@w3.org
Masaki Itagaki wrote:

> Basically this requirement is the same as the one in the original 
> draft, but I added the issue of writing styles as I discussed in the 
> MLs and conference calls. As to the core portion of this requirement, 
> it's highly likely that I missed something or am getting something 
> wrong. Please post any comments on this.
>
>  
>
> Masaki
>
> ------------------------------------------------------------------------
>
> *Requirement: *
>
> It must be possible to declare more information about content than a 
> language/locale for better text parsing and content reusability. 
> Aspects that require finer granularity of content specification may 
> include script usages, geographical areas, dialects or content 
> context. The declaration of such an attribute should be done at the 
> beginning of a document. Any content within a document which varies 
> from the primary declaration should be labeled appropriately.
>
>  
>
> *Background :*
>
> In order to successfully and efficiently parse document content, there 
> should be more information than a language or a locale. Examples of 
> issues are:
>
>            *A language/locale cannot perfectly represent 
> orthography*: e.g. “zh-CN” does not stipulate if it’s simplified or 
> traditional Chinese. Locale for Yugoslavia does not provide guidance 
> as to whether the language should be writeen in Latin or Cyrillic 
> scripts.
>
>            *Multiple cultural preferences within one locale*: e.g. In 
> Japanese (“ja-JP”), there are two official date formats – Japanese 
> emperor date (Wareki) and a standard numeridateat a voice track is in 
> the language spoken in German-speaking Switzerland rather than the 
> language written there, since one is Schwytzertuutsch (Swiss Germen) 
> and the other is very close to but not the same as 'High German'? How 
> does one indicate that a piece of content is in 'International 
> Spanish'? How does one indicate that this is English as spoken in the 
> time of Chaucer?
>
>            *Different writing styles and tones in one language*: e.g. 
> Japanese uses a polite style (“Desu/masu tone”) for user guides and a 
> formal style (“Da/dearu tone”) for academic and legal content. Italian 
> uses an informal style for software help content and a formal style 
> for user guides. 
>
Just another example: Diffrence in word or character.
- In Arabic, most middle-east countries use indic-digits while western 
countries (e.g. north Africa) use latin digits.
- Months (..., mai, june ...) also have different names.  In morocco, 
for April, we write "ABRIL" pronouced abril, in middle-east they write 
"NISSAN" pronounced nissan.
Here is an example from the newspaper  Asharq Alawsat 
http://www.aawsat.com/view/
date
where you have (from left to right)  "2005 (NISSAN) ABRIL..."
The less used month name NISSAN is between parenthesis .

Translating from English to Arabic a translator should distinguish ar-MA 
from ar-EG.

Najib

>  
>
> Identifying these variations is very important especially for content 
> reusability. For example, the same source-language content could be 
> translated into two different target-language content units depending 
> on context that leads to different writing styles (e.g. formal and 
> informal in Italian). When the content is reused both in source and 
> target languages, context information (such as whether the content is 
> for a user guide or a user help) must be provided in order to reuse 
> content with an appropriate writing style.  
>
>  
>
>  
>

-- 
Najib TOUNSI (mailto:tounsi@w3.org)
Bureau W3C au Maroc (http://www.w3c.org.ma/)
Ecole Mohammadia d'Ingenieurs, BP 765 Agdal-RABAT Maroc (Morocco)
Phone : +212 (0) 37 68 71 74  Fax : +212 (0) 37 77 88 53
Mobile: +212 (0) 61 22 00 30
Received on Friday, 15 April 2005 19:57:28 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 8 January 2008 14:12:44 GMT