W3C home > Mailing lists > Public > public-i18n-its@w3.org > April to June 2005

Re: Describing other cultural aspects of the content

From: Najib Tounsi <ntounsi@emi.ac.ma>
Date: Fri, 15 Apr 2005 19:57:59 +0000
Message-ID: <42601CC7.5040007@emi.ac.ma>
To: masaki_itagaki@aliquantuminc.com
CC: public-i18n-its@w3.org
Masaki Itagaki wrote:

> Basically this requirement is the same as the one in the original 
> draft, but I added the issue of writing styles as I discussed in the 
> MLs and conference calls. As to the core portion of this requirement, 
> it's highly likely that I missed something or am getting something 
> wrong. Please post any comments on this.
> Masaki
> ------------------------------------------------------------------------
> *Requirement: *
> It must be possible to declare more information about content than a 
> language/locale for better text parsing and content reusability. 
> Aspects that require finer granularity of content specification may 
> include script usages, geographical areas, dialects or content 
> context. The declaration of such an attribute should be done at the 
> beginning of a document. Any content within a document which varies 
> from the primary declaration should be labeled appropriately.
> *Background :*
> In order to successfully and efficiently parse document content, there 
> should be more information than a language or a locale. Examples of 
> issues are:
>            *A language/locale cannot perfectly represent 
> orthography*: e.g. “zh-CN” does not stipulate if it’s simplified or 
> traditional Chinese. Locale for Yugoslavia does not provide guidance 
> as to whether the language should be writeen in Latin or Cyrillic 
> scripts.
>            *Multiple cultural preferences within one locale*: e.g. In 
> Japanese (“ja-JP”), there are two official date formats – Japanese 
> emperor date (Wareki) and a standard numeridateat a voice track is in 
> the language spoken in German-speaking Switzerland rather than the 
> language written there, since one is Schwytzertuutsch (Swiss Germen) 
> and the other is very close to but not the same as 'High German'? How 
> does one indicate that a piece of content is in 'International 
> Spanish'? How does one indicate that this is English as spoken in the 
> time of Chaucer?
>            *Different writing styles and tones in one language*: e.g. 
> Japanese uses a polite style (“Desu/masu tone”) for user guides and a 
> formal style (“Da/dearu tone”) for academic and legal content. Italian 
> uses an informal style for software help content and a formal style 
> for user guides. 
Just another example: Diffrence in word or character.
- In Arabic, most middle-east countries use indic-digits while western 
countries (e.g. north Africa) use latin digits.
- Months (..., mai, june ...) also have different names.  In morocco, 
for April, we write "ABRIL" pronouced abril, in middle-east they write 
"NISSAN" pronounced nissan.
Here is an example from the newspaper  Asharq Alawsat 
where you have (from left to right)  "2005 (NISSAN) ABRIL..."
The less used month name NISSAN is between parenthesis .

Translating from English to Arabic a translator should distinguish ar-MA 
from ar-EG.


> Identifying these variations is very important especially for content 
> reusability. For example, the same source-language content could be 
> translated into two different target-language content units depending 
> on context that leads to different writing styles (e.g. formal and 
> informal in Italian). When the content is reused both in source and 
> target languages, context information (such as whether the content is 
> for a user guide or a user help) must be provided in order to reuse 
> content with an appropriate writing style.  

Najib TOUNSI (mailto:tounsi@w3.org)
Bureau W3C au Maroc (http://www.w3c.org.ma/)
Ecole Mohammadia d'Ingenieurs, BP 765 Agdal-RABAT Maroc (Morocco)
Phone : +212 (0) 37 68 71 74  Fax : +212 (0) 37 77 88 53
Mobile: +212 (0) 61 22 00 30
Received on Friday, 15 April 2005 19:57:28 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:43:04 UTC