- From: John Cowan <jcowan@reutershealth.com>
- Date: Mon, 27 Dec 2004 13:07:07 -0500
- To: Andrew Cunningham <andj_c@iprimus.com.au>
- Cc: Tex Texin <tex@xencraft.com>, WWW International <www-international@w3.org>, IETF Languages <ietf-languages@iana.org>
Andrew Cunningham scripsit: > ar-SD (Arabic) also this tag could be considered to be ambiguous .. is > it the national language of Sudan (Standard Modern Arabic) or is it the > Sudanese Arabic dialect? At present the ar language tag is inherently ambiguous: ISO 639-1 maps it to the name "Arabic", without any clarification of what language or languages "Arabic" might refer to. The editor's draft of ISO 639-3 clarifies the mapping of "ar" to refer to both Modern Standard Arabic and the colloquials, and tentatively assigns the code "arb" to MSA and different codes to the 29 recognized colloquials. In the language of the draft, codes like "ar" refer to what are called "macro-languages", explained as follows: # In various parts of the world, there are clusters of closely-related # language varieties that, based on the criteria discussed in 4.2.1, # can be considered individual languages, yet in certain usage contexts a # single language identity for all is needed. Typical situations in which # this need can occur include the following: # # There is one variety that is more developed and that tends # to be used for wider communication by speakers of various # closely-related languages; as a result, there is a perceived # common linguistic identity across these languages. For instance, # there are several distinct spoken Arabic languages, but Standard # Arabic is generally used in business and media across all of # these communities, and is also an important aspect of a shared # ethno-religious unity. As a result, a perceived common linguistic # identity exists. # # There is a common written form used for multiple closely-related # languages. For instance, multiple Chinese languages share a # common written form. # # There is a transitional socio-linguistic situation in which # sub-communities of a single language community are diverging, # creating a need for some purposes to recognise distinct # languages while, for other purposes, a single common identity # is still valid. For instance, in some business contexts it is # necessary to make a distinction between Bosnian, Croatian and # Serbian languages, yet there are other contexts in which these # distinctions are not discernable in language resources that are # in use. # # Where such situations exist, an identifier for the single, common language # identity is considered in this part of ISO 639 to be a macrolanguage # identifier. Macrolanguages are distinguished from language collections # in that the individual languages that correspond to a macrolanguage must # be very closely related, and there must be some domain in which only a # single language identity is recognized. The draft specifies the following 56 macrolanguages: ak Akan (2 languages) ar Arabic (30 languages) ay Aymara (2 languages) az Azerbaijani (2 languages) bal Baluchi (3 languages) bik Bikol (5 languages) bua Buriat (3 languages) chm Mari (2 languages) cr Cree (6 languages) del Delaware (2 languages) den Slave (2 languages) din Dinka (5 languages) doi Dogri (2 languages) fa Persian (2 languages) ff Fulah (9 languages) fy Frisian (3 languages) gba Gbaya (5 languages) gn Guarani (5 languages) gon Gondi (2 languages) grb Grebo (5 languages) hai Haida (2 languages) hbs Serbo-Croatian (3 languages) hmn Hmong (21 languages) ik Inupiaq (2 languages) iu Inuktitut (2 languages) jrb Judeo-Arabic (5 languages) kg Kongo (3 languages) kok Konkani (2 languages) kpe Kpelle (2 languages) kr Kanuri (3 languages) ku Kurdish (3 languages) kv Komi (2 languages) lah Lahnda (8 languages) man Mandingo (7 languages) mg Malagasy (10 languages) mn Mongolian (2 languages) ms Malay (13 languages) mwr Marwari (7 languages) no Norwegian (2 languages) oc Occitan; Proven)B��l (5 languages) oj Ojibwa (7 languages) om Oromo (4 languages) ps Pushto (3 languages) qu Quechua (44 languages) raj Rajasthani (6 languages) rom Romany (7 languages) sc Sardinian (4 languages) sq Albanian (4 languages) sw Swahili (2 languages) syr Syriac (2 languages) tmh Tamashek (4 languages) uz Uzbek (2 languages) yi Yiddish (2 languages) za Zhuang (2 languages) zap Zapotec (58 languages) zh Chinese (13 languages) -- Even a refrigerator can conform to the XML John Cowan Infoset, as long as it has a door sticker jcowan@reutershealth.com saying "No information items inside". http://www.reutershealth.com --Eve Maler http://www.ccil.org/~cowan
Received on Monday, 27 December 2004 18:07:52 UTC