- From: Christian Chiarcos <chiarcos@informatik.uni-frankfurt.de>
- Date: Tue, 20 Dec 2016 23:20:52 +0100
- To: Misha.Wolf@thomsonreuters.com, mvalle@cscs.ch, "Doug Ewell" <doug@ewellic.org>
- Cc: ietf-languages@iana.org, semantic-web@w3.org
- Message-ID: <op.yssi02ul89jat0@kitaba.sitecomwlr2100>
Am .12.2016, 19:32 Uhr, schrieb Doug Ewell <doug@ewellic.org>: > And if you are following BCP 47, you need to use the IANA Language > Subtag Registry as the source of language subtags, and not worry about > ISO >639-1 and 639-3 and the rest. +1 Deviating from this is literally tag abuse. Bad practice, though sometimes, it does happen -- and normally for a reason. However, if existing language tags really aren't sufficient, it is better *not* to use a (potentially misleading or ill-defined) language tag, but to represent language variety information of a resource abc:xyz with an explicit link to a repository such as Glottolog: abc:xyz dcterms:language <http://glottolog.org/resource/languoid/id/stan1293>. Christian > > -- > Doug Ewell | Thornton, CO, US | ewellic.org > > -------- Original message -------- > From: Misha.Wolf@thomsonreuters.comDate: 12/20/16 10:34 (GMT-07:00)To: > chiarcos@informatik.uni-frankfurt.de, mvalle@cscs.chCc: > ietf-languages@iana.org, semantic-web@w3.org, christian.chiarcos@web.de > Subject: RE: Clarification about language tag > + ietf-languages@iana.org > > Whether you work with modern-day languages or any other languages, you > must follow BCP47. > > And, in following BCP-47, you must be prepared to use whatever language > tag length is required, not assume that all language tags will have the > same length. > > Consider, for example, these longer language tags: > - sr-Cyrl = Serbian (Cyrillic) > - sr-Latn = Serbian (Latin) > - uz-Cyrl = Uzbek (Cyrillic) > - uz-Latn = Uzbek (Latin) > > Regards, > Misha > > > -----Original Message----- > From: Christian Chiarcos [mailto:chiarcos@informatik.uni-frankfurt.de] > Sent: 20 December 2016 17:19 > To: semantic-web@w3.org Web; Mario Valle > Cc: christian.chiarcos@web.de > Subject: Re: Clarification about language tag > > Dear Mario, > >> In Turtle syntax the @lang tag syntax refers to BCP47 that states: >> >> language = 2*3ALPHA ; shortest ISO 639 code >> >> That is, the language code (I ignore all the variants here) should be 2 >> or 3 characters. > > This means you should use the two-letter code for a language that has > one (@en) even if it does have a three-letter code (@eng). Not every > language does have a two-letter code. > >> Indeed ISO 639 >> (https://urldefense.proofpoint.com/v2/url?>u=http-3A__www.loc.gov_standards_iso639-2D2_php_code-5Flist.php&d=CwIFbA&c=4ZIZThykDLcoWk->GVjSLm9hvvvzvGv0FLoWSRuCSs5Q&r=VsO6ShdzLK20Tv5zCK2CUVP_oB340q3grZz3gJtouLE&m=SnuMGpH9aJBpJG4G8i3x6v1GSkDNibOUkGqj0zyTv_o&s=TEzQYtkmHF->FAqtk-AbmPZIVKuLy0UGpXXfHOfCIwQ0&e= >> ) lists both 2 and 3 chars codes (e.g., English: 'en' and 'eng'). >> >> But in all Turtle examples I have found the language code has 2 chars. >> Is it a requirement or is simply a tradition? This means, could I write >> "Pancake"@eng? >> >> The question arises because WordNet contains 3 chars codes, so to >> transform into triples, should/shouldn't I convert it to 2 characters? > > The reason is that the 2-character codes are insufficient from the > perspective of multilingual NLP or linguistics where ISO 639-3 is much > more established (and somewhat better defined) than ISO 639-1 2-letter > codes. Therefore, people developing language resources (like WordNet) > sometimes tend to neglect ISO 639-1 codes altogether. I also went that > way at times. In terms of BCP47, however, this is a mistake and should > be fixed. As long as you work with modern-day major languages only and > you don't see issues with the 2-letter codes for your task/resource, you > should definitely follow BCP47 and use 2-letter codes wherever possible. > > Best, > Christian > >> >> Thanks for your patience >> >> mario >> > > > --Prof. Dr. Christian Chiarcos > Applied Computational Linguistics > Johann Wolfgang Goethe Universität Frankfurt a. M. > 60054 Frankfurt am Main, Germany > > office: Robert-Mayer-Str. 10, #401b > mail: chiarcos@informatik.uni-frankfurt.de > web: > https://urldefense.proofpoint.com/v2/url?u=http-3A__acoli.cs.uni-2Dfrankfurt.de&d=CwIFbA&c=4ZIZThykDLcoWk->GVjSLm9hvvvzvGv0FLoWSRuCSs5Q&r=VsO6ShdzLK20Tv5zCK2CUVP_oB340q3grZz3gJtouLE&m=SnuMGpH9aJBpJG4G8i3x6v1GSkDNibOUkGqj0zyTv_o&s=SYYlim1HJWSJMzRcHsHxPJTJurnKt2vFAm48s952MLA&e= > tel: +49-(0)69-798-22463 > fax: +49-(0)69-798-28931 > > _______________________________________________ > Ietf-languages mailing list > Ietf-languages@alvestrand.no > http://www.alvestrand.no/mailman/listinfo/ietf-languages -- Prof. Dr. Christian Chiarcos Applied Computational Linguistics Johann Wolfgang Goethe Universität Frankfurt a. M. 60054 Frankfurt am Main, Germany office: Robert-Mayer-Str. 10, #401b mail: chiarcos@informatik.uni-frankfurt.de web: http://acoli.cs.uni-frankfurt.de tel: +49-(0)69-798-22463 fax: +49-(0)69-798-28931
Received on Tuesday, 20 December 2016 22:22:01 UTC