W3C home > Mailing lists > Public > www-international@w3.org > October to December 2004

Re: Language Identifier List up for comments

From: Elizabeth J. Pyatt <ejp10@psu.edu>
Date: Tue, 14 Dec 2004 10:02:28 -0500
Message-Id: <p06100501bde4aea1fcd0@[128.118.8.31]>
To: John Burger <john@mitre.org>
Cc: www-international@w3.org, ietf-languages@alvestrand.no

I actually think John made my point better than I did.

My question is whether you want to capture only standard languages or 
all possible dialects? If it is the latter, then the list will get 
much larger and will not be restricted to just nations only. Some 
dialects could be regional, but some could be based more on 
socio-economic  factors (e.g. African American Vernacular English). I 
would actually recommend another working group for that.

As a linguist, I would want a taxonomy to describe all spoken 
languages/dialects. For instance, there is no :"language code" for 
the different spoken Chinese  forms (e.g. Cantonese, Hakka, etc) and 
using a country code would not be adequate to distinguish them.

For written language, this is not normally an issue because the 
phonetics are not represented. Therefore a single code of "zh" is 
adequate. If you are considering standard written languages only, 
then I think the current list is on the right track, but would need 
tweaks.

Spoken varieties on the other hand is a really big project.

Elizabeth Pyatt

>Elizabeth J. Pyatt wrote:
>
>>Do you really need to specify different types of English used in 
>>the United States territories (e.g. Puerto Rico, Guam, etc). I'm 
>>aware that there are local varieties in some cases, but I'm not 
>>sure they are reflected in the WRITTEN forms, just in 
>>pronunciation. That is, business English is the same in Puerto Rico 
>>as in the continental U.S.
>
>>Theoretically, you could create a pronunciation/syntax engine for 
>>en-PR as well as en-TX (Texas), en-NYC (New Yawk City), etc, but 
>>I'm not sure how well received it would be as a serious tool.
>
>Machine-generated speech is presumably not the only spoken resource 
>that needs language codes.  I can imagine labeling a library of 
>recordings of English speakers from lots of places, and thus needing 
>all of en-US-PR, en-US-TX and en-US-NY-NewYorkCity.
>
>- John D. Burger
>   MITRE


-- 
=-=-=-=-=-=-=-=-=-=-=-=-=
Elizabeth J. Pyatt, Ph.D.
Instructional Designer
Education Technology Services, TLT/ITS
Penn State University
ejp10@psu.edu, (814) 865-0805 or (814) 865-2030 (Main Office)

210 Rider Building II
227 W. Beaver Avenue
State College, PA   16801-4819
http://www.personal.psu.edu/ejp10/psu
http://tlt.psu.edu
Received on Tuesday, 14 December 2004 15:02:37 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:17:04 GMT