- From: Najib Tounsi <ntounsi@emi.ac.ma>
- Date: Fri, 15 Apr 2005 19:57:59 +0000
- To: masaki_itagaki@aliquantuminc.com
- CC: public-i18n-its@w3.org
- Message-ID: <42601CC7.5040007@emi.ac.ma>
Masaki Itagaki wrote: > Basically this requirement is the same as the one in the original > draft, but I added the issue of writing styles as I discussed in the > MLs and conference calls. As to the core portion of this requirement, > it's highly likely that I missed something or am getting something > wrong. Please post any comments on this. > > > > Masaki > > ------------------------------------------------------------------------ > > *Requirement: * > > It must be possible to declare more information about content than a > language/locale for better text parsing and content reusability. > Aspects that require finer granularity of content specification may > include script usages, geographical areas, dialects or content > context. The declaration of such an attribute should be done at the > beginning of a document. Any content within a document which varies > from the primary declaration should be labeled appropriately. > > > > *Background :* > > In order to successfully and efficiently parse document content, there > should be more information than a language or a locale. Examples of > issues are: > > *A language/locale cannot perfectly represent > orthography*: e.g. “zh-CN” does not stipulate if it’s simplified or > traditional Chinese. Locale for Yugoslavia does not provide guidance > as to whether the language should be writeen in Latin or Cyrillic > scripts. > > *Multiple cultural preferences within one locale*: e.g. In > Japanese (“ja-JP”), there are two official date formats – Japanese > emperor date (Wareki) and a standard numeridateat a voice track is in > the language spoken in German-speaking Switzerland rather than the > language written there, since one is Schwytzertuutsch (Swiss Germen) > and the other is very close to but not the same as 'High German'? How > does one indicate that a piece of content is in 'International > Spanish'? How does one indicate that this is English as spoken in the > time of Chaucer? > > *Different writing styles and tones in one language*: e.g. > Japanese uses a polite style (“Desu/masu tone”) for user guides and a > formal style (“Da/dearu tone”) for academic and legal content. Italian > uses an informal style for software help content and a formal style > for user guides. > Just another example: Diffrence in word or character. - In Arabic, most middle-east countries use indic-digits while western countries (e.g. north Africa) use latin digits. - Months (..., mai, june ...) also have different names. In morocco, for April, we write "ABRIL" pronouced abril, in middle-east they write "NISSAN" pronounced nissan. Here is an example from the newspaper Asharq Alawsat http://www.aawsat.com/view/ date where you have (from left to right) "2005 (NISSAN) ABRIL..." The less used month name NISSAN is between parenthesis . Translating from English to Arabic a translator should distinguish ar-MA from ar-EG. Najib > > > Identifying these variations is very important especially for content > reusability. For example, the same source-language content could be > translated into two different target-language content units depending > on context that leads to different writing styles (e.g. formal and > informal in Italian). When the content is reused both in source and > target languages, context information (such as whether the content is > for a user guide or a user help) must be provided in order to reuse > content with an appropriate writing style. > > > > > -- Najib TOUNSI (mailto:tounsi@w3.org) Bureau W3C au Maroc (http://www.w3c.org.ma/) Ecole Mohammadia d'Ingenieurs, BP 765 Agdal-RABAT Maroc (Morocco) Phone : +212 (0) 37 68 71 74 Fax : +212 (0) 37 77 88 53 Mobile: +212 (0) 61 22 00 30
Received on Friday, 15 April 2005 19:57:28 UTC