Re: telco this Friday from Gil Francopoulo on 2014-01-30 (public-ontolex@w3.org from January 2014)

From: Gil Francopoulo <gil.francopoulo@wanadoo.fr>
Date: Thu, 30 Jan 2014 09:37:55 +0100
To: Felix Sasaki <fsasaki@w3.org>, public-ontolex@w3.org
Message-ID: <52EA0F63.1000307@wanadoo.fr>
Le 30/01/2014 09:18, Felix Sasaki a écrit :
> Hi Gil, all,
>
> Am 30.01.14 09:12, schrieb Gil Francopoulo:
>> Dear Philip and Lars,
>>
>> I agree with Lars.
>>
>> I suggest to take a look (and follow) IETF BCP 47 in the examples
>
> +1.
>
>> , where:
>>
>> * a language code is never in upper-case but in lower-case,
>
> both would be fine according to BCP 47 - it is case insensitive.
>
>> * a country code is always in upper-case and respects ISO-3166-1
>
> see above.

ok, but in the ISO lists, language codes are always lower-case and 
country codes are always upper-case.

And in http://tools.ietf.org/search/bcp47, section 2.1.1

The ABNF syntax also does not distinguish between upper- and
    lowercase: the uppercase US-ASCII letters in the range 'A' through
    'Z' are always considered equivalent and mapped directly to their US-
    ASCII lowercase equivalents in the range 'a' through 'z'.  So the tag
    "I-AMI" is considered equivalent to that value "i-ami" in the
    'irregular' production.

    Although case distinctions do not carry meaning in language tags,
    consistent formatting and presentation of language tags will aid
    users.  The format of subtags in the registry is RECOMMENDED as the
    form to use in language tags.  This format generally corresponds to
    the common conventions for the various ISO standards from which the
    subtags are derived.

    These conventions include:

    o  [ISO639-1  <http://tools.ietf.org/search/bcp47#ref-ISO639-1>] recommends that language codes be written in lowercase
       ('mn' Mongolian).

    o  [ISO15924  <http://tools.ietf.org/search/bcp47#ref-ISO15924>] recommends that script codes use lowercase with the
       initial letter capitalized ('Cyrl' Cyrillic).

    o  [ISO3166-1  <http://tools.ietf.org/search/bcp47#ref-ISO3166-1>] recommends that country codes be capitalized ('MN'
       Mongolia).




>
>> * this is to allow combination like eng (when any detail is not 
>> needed) but permits precisions like eng-US or eng-UK.
>
> eng-US is not a bcp 47 language tag, since bcp47 requires the use of a 
> two letter code if available , see
> http://tools.ietf.org/html/bcp47#section-2.2.1
> " When languages have both an ISO 639-1 two-character code and a three-
>    character code (assigned by ISO 639-2, ISO 639-3, or ISO 639-5), only
>    the ISO 639-1 two-character code is defined in the IANA registry."

You are right.
Gil

>
> - Felix
>
>> * to follow ISO-639-3 to access to a larger range of values than 
>> ISO-639-1
>> * IMHO nobody follow ISO-639-2 nowadays (it was a sort of wrong trial)
>> * ISO-639-6 is not used
>>
>> Hoping that helps,
>> Gil
>>
>>
>> Le 30/01/2014 08:44, Lars Borin a écrit :
>>> Dear all,
>>>
>>>>
>>>>
>>>>     Other that that I wanted to clarify one issue regarding
>>>>     language codes in the example.
>>>>
>>>>     I have seen that some people (John?) have started to use the
>>>>     ISO 639-2 codes (e.g. "ENG" for English, "SPA" for Spanish etc.).
>>>>     I would propose we stick to the ISO 639-1 two-letter ISO 639-1
>>>>     codes (e.g. "EN", "ES") etc. There is no particular reason for
>>>>     this other than the fact that most people know these codes.
>>>>
>>>>     If the argument is recency and reusing the newest standard,
>>>>     then we would have to go anyway for four letter codes according
>>>>     to ISO 639-6.
>>>>
>>>>
>>>> In the open mulitlingual wordnet we use the three letter codes 
>>>> because there are people working on languages which do not have two 
>>>> letter codes, such as Abui (abz),  Minangkabau (min) or Cantonese 
>>>> (yue).  Note that some of these are large language communities, 
>>>> Minangkabauhas around 6 million speakers. I think this is a strong 
>>>> argument for not going back to the two letter codes.
>>>
>>> I suspect that the three-letter codes in question are intended to be 
>>> ISO 639-3 (and not 639-2), the use of which is pretty much best 
>>> practice in linguistics today (even if there is quite a bit of 
>>> discussion about how well it reflects lingusitic descriptive 
>>> practice and actual reality; see, e.g., 
>>> <http://dlc.hypotheses.org/610>), because of coverage (not even all 
>>> the languages of Europe are covered by 639-1, e.g. the two Sorbian 
>>> languages) and because of granularity: The "language" level of ISO 
>>> 639-3 (basically that of the Ethnologue) will not be included in 
>>> 639-6, so there won't be a way of saying "English", since 639-3 
>>> already provides one, but you will be able to say (or, rather, 
>>> propose codes for), e.g., "Elizabethan English", "Modern Australian 
>>> English", etc.
>>>
>>> Best
>>> Lars
>>>
>>> -- 
>>> «Null hull,» sa Harry    | – Bögga? sagði Erlendur. Er það orð? |
>>> (Jo Nesbø: Kakerlakkene) | (Arnaldur Indriðason: Mýrin)         |
>>> --
>>> Se aikainen matohan nokitaan!
>>> (Reijo Mäki: Uhkapelimerkki)
>>> ----
>>> Lars Borin
>>> Språkbanken • Centre for Language Technology
>>> Institutionen för svenska språket
>>> Göteborgs universitet
>>> Box 200
>>> SE-405 30 Göteborg
>>> Sweden
>>>
>>> office +46 (0)31 786 4544
>>> mobile +46 (0)70 747 8386
>>>
>>> <http://språkbanken.gu.se/personal/lars/>
>>
>
Received on Thursday, 30 January 2014 08:38:22 UTC