- From: Felix Sasaki <fsasaki@w3.org>
- Date: Thu, 30 Jan 2014 09:18:25 +0100
- To: Gil Francopoulo <gil.francopoulo@wanadoo.fr>, public-ontolex@w3.org
- Message-ID: <52EA0AD1.80100@w3.org>
Hi Gil, all, Am 30.01.14 09:12, schrieb Gil Francopoulo: > Dear Philip and Lars, > > I agree with Lars. > > I suggest to take a look (and follow) IETF BCP 47 in the examples +1. > , where: > > * a language code is never in upper-case but in lower-case, both would be fine according to BCP 47 - it is case insensitive. > * a country code is always in upper-case and respects ISO-3166-1 see above. > * this is to allow combination like eng (when any detail is not > needed) but permits precisions like eng-US or eng-UK. eng-US is not a bcp 47 language tag, since bcp47 requires the use of a two letter code if available , see http://tools.ietf.org/html/bcp47#section-2.2.1 " When languages have both an ISO 639-1 two-character code and a three- character code (assigned by ISO 639-2, ISO 639-3, or ISO 639-5), only the ISO 639-1 two-character code is defined in the IANA registry." - Felix > * to follow ISO-639-3 to access to a larger range of values than ISO-639-1 > * IMHO nobody follow ISO-639-2 nowadays (it was a sort of wrong trial) > * ISO-639-6 is not used > > Hoping that helps, > Gil > > > Le 30/01/2014 08:44, Lars Borin a écrit : >> Dear all, >> >>> >>> >>> Other that that I wanted to clarify one issue regarding language >>> codes in the example. >>> >>> I have seen that some people (John?) have started to use the ISO >>> 639-2 codes (e.g. "ENG" for English, "SPA" for Spanish etc.). >>> I would propose we stick to the ISO 639-1 two-letter ISO 639-1 >>> codes (e.g. "EN", "ES") etc. There is no particular reason for >>> this other than the fact that most people know these codes. >>> >>> If the argument is recency and reusing the newest standard, then >>> we would have to go anyway for four letter codes according to >>> ISO 639-6. >>> >>> >>> In the open mulitlingual wordnet we use the three letter codes >>> because there are people working on languages which do not have two >>> letter codes, such as Abui (abz), Minangkabau (min) or Cantonese >>> (yue). Note that some of these are large language communities, >>> Minangkabauhas around 6 million speakers. I think this is a strong >>> argument for not going back to the two letter codes. >> >> I suspect that the three-letter codes in question are intended to be >> ISO 639-3 (and not 639-2), the use of which is pretty much best >> practice in linguistics today (even if there is quite a bit of >> discussion about how well it reflects lingusitic descriptive practice >> and actual reality; see, e.g., <http://dlc.hypotheses.org/610>), >> because of coverage (not even all the languages of Europe are covered >> by 639-1, e.g. the two Sorbian languages) and because of granularity: >> The "language" level of ISO 639-3 (basically that of the Ethnologue) >> will not be included in 639-6, so there won't be a way of saying >> "English", since 639-3 already provides one, but you will be able to >> say (or, rather, propose codes for), e.g., "Elizabethan English", >> "Modern Australian English", etc. >> >> Best >> Lars >> >> -- >> «Null hull,» sa Harry | – Bögga? sagði Erlendur. Er það orð? | >> (Jo Nesbø: Kakerlakkene) | (Arnaldur Indriðason: Mýrin) | >> -- >> Se aikainen matohan nokitaan! >> (Reijo Mäki: Uhkapelimerkki) >> ---- >> Lars Borin >> Språkbanken • Centre for Language Technology >> Institutionen för svenska språket >> Göteborgs universitet >> Box 200 >> SE-405 30 Göteborg >> Sweden >> >> office +46 (0)31 786 4544 >> mobile +46 (0)70 747 8386 >> >> <http://språkbanken.gu.se/personal/lars/> >
Received on Thursday, 30 January 2014 08:18:51 UTC