Re: telco this Friday from Felix Sasaki on 2014-01-30 (public-ontolex@w3.org from January 2014)

From: Felix Sasaki <fsasaki@w3.org>
Date: Thu, 30 Jan 2014 09:18:25 +0100
To: Gil Francopoulo <gil.francopoulo@wanadoo.fr>, public-ontolex@w3.org
Message-ID: <52EA0AD1.80100@w3.org>
Hi Gil, all,

Am 30.01.14 09:12, schrieb Gil Francopoulo:
> Dear Philip and Lars,
>
> I agree with Lars.
>
> I suggest to take a look (and follow) IETF BCP 47 in the examples

+1.

> , where:
>
> * a language code is never in upper-case but in lower-case,

both would be fine according to BCP 47 - it is case insensitive.

> * a country code is always in upper-case and respects ISO-3166-1

see above.

> * this is to allow combination like eng (when any detail is not 
> needed) but permits precisions like eng-US or eng-UK.

eng-US is not a bcp 47 language tag, since bcp47 requires the use of a 
two letter code if available , see
http://tools.ietf.org/html/bcp47#section-2.2.1
" When languages have both an ISO 639-1 two-character code and a three-
    character code (assigned by ISO 639-2, ISO 639-3, or ISO 639-5), only
    the ISO 639-1 two-character code is defined in the IANA registry."

- Felix

> * to follow ISO-639-3 to access to a larger range of values than ISO-639-1
> * IMHO nobody follow ISO-639-2 nowadays (it was a sort of wrong trial)
> * ISO-639-6 is not used
>
> Hoping that helps,
> Gil
>
>
> Le 30/01/2014 08:44, Lars Borin a écrit :
>> Dear all,
>>
>>>
>>>
>>>     Other that that I wanted to clarify one issue regarding language
>>>     codes in the example.
>>>
>>>     I have seen that some people (John?) have started to use the ISO
>>>     639-2 codes (e.g. "ENG" for English, "SPA" for Spanish etc.).
>>>     I would propose we stick to the ISO 639-1 two-letter ISO 639-1
>>>     codes (e.g. "EN", "ES") etc. There is no particular reason for
>>>     this other than the fact that most people know these codes.
>>>
>>>     If the argument is recency and reusing the newest standard, then
>>>     we would have to go anyway for four letter codes according to
>>>     ISO 639-6.
>>>
>>>
>>> In the open mulitlingual wordnet we use the three letter codes 
>>> because there are people working on languages which do not have two 
>>> letter codes, such as Abui (abz),  Minangkabau (min) or Cantonese 
>>> (yue). Note that some of these are large language communities, 
>>> Minangkabauhas around 6 million speakers. I think this is a strong 
>>> argument for not going back to the two letter codes.
>>
>> I suspect that the three-letter codes in question are intended to be 
>> ISO 639-3 (and not 639-2), the use of which is pretty much best 
>> practice in linguistics today (even if there is quite a bit of 
>> discussion about how well it reflects lingusitic descriptive practice 
>> and actual reality; see, e.g., <http://dlc.hypotheses.org/610>), 
>> because of coverage (not even all the languages of Europe are covered 
>> by 639-1, e.g. the two Sorbian languages) and because of granularity: 
>> The "language" level of ISO 639-3 (basically that of the Ethnologue) 
>> will not be included in 639-6, so there won't be a way of saying 
>> "English", since 639-3 already provides one, but you will be able to 
>> say (or, rather, propose codes for), e.g., "Elizabethan English", 
>> "Modern Australian English", etc.
>>
>> Best
>> Lars
>>
>> -- 
>> «Null hull,» sa Harry    | – Bögga? sagði Erlendur. Er það orð? |
>> (Jo Nesbø: Kakerlakkene) | (Arnaldur Indriðason: Mýrin)         |
>> --
>> Se aikainen matohan nokitaan!
>> (Reijo Mäki: Uhkapelimerkki)
>> ----
>> Lars Borin
>> Språkbanken • Centre for Language Technology
>> Institutionen för svenska språket
>> Göteborgs universitet
>> Box 200
>> SE-405 30 Göteborg
>> Sweden
>>
>> office +46 (0)31 786 4544
>> mobile +46 (0)70 747 8386
>>
>> <http://språkbanken.gu.se/personal/lars/>
>
Received on Thursday, 30 January 2014 08:18:51 UTC