Re: Clarification about language tag

Dear Mario,

> In Turtle syntax the @lang tag syntax refers to BCP47 that states:
>
> language      = 2*3ALPHA            ; shortest ISO 639 code
>
> That is, the language code (I ignore all the variants here) should be 2  
> or 3 characters.

This means you should use the two-letter code for a language that has one  
(@en) even if it does have a three-letter code (@eng). Not every language  
does have a two-letter code.

> Indeed ISO 639 (http://www.loc.gov/standards/iso639-2/php/code_list.php)  
> lists both 2 and 3 chars codes (e.g., English: 'en' and 'eng').
>
> But in all Turtle examples I have found the language code has 2 chars.  
> Is it a requirement or is simply a tradition? This means, could I write  
> "Pancake"@eng?
>
> The question arises because WordNet contains 3 chars codes, so to  
> transform into triples, should/shouldn't I convert it to 2 characters?

The reason is that the 2-character codes are insufficient from the  
perspective of multilingual NLP or linguistics where ISO 639-3 is much  
more established (and somewhat better defined) than ISO 639-1 2-letter  
codes. Therefore, people developing language resources (like WordNet)  
sometimes tend to neglect ISO 639-1 codes altogether. I also went that way  
at times. In terms of BCP47, however, this is a mistake and should be  
fixed. As long as you work with modern-day major languages only and you  
don't see issues with the 2-letter codes for your task/resource, you  
should definitely follow BCP47 and use 2-letter codes wherever possible.

Best,
Christian

>
> Thanks for your patience
>
> 				mario
>


-- 
Prof. Dr. Christian Chiarcos
Applied Computational Linguistics
Johann Wolfgang Goethe Universität Frankfurt a. M.
60054 Frankfurt am Main, Germany

office: Robert-Mayer-Str. 10, #401b
mail: chiarcos@informatik.uni-frankfurt.de
web: http://acoli.cs.uni-frankfurt.de
tel: +49-(0)69-798-22463
fax: +49-(0)69-798-28931

Received on Tuesday, 20 December 2016 17:19:15 UTC