- From: Felix Sasaki <fsasaki@w3.org>
- Date: Thu, 30 Jan 2014 09:46:07 +0100
- To: Gil Francopoulo <gil.francopoulo@wanadoo.fr>, public-ontolex@w3.org
- Message-ID: <52EA114F.4020006@w3.org>
Am 30.01.14 09:37, schrieb Gil Francopoulo: > Le 30/01/2014 09:18, Felix Sasaki a écrit : >> Hi Gil, all, >> >> Am 30.01.14 09:12, schrieb Gil Francopoulo: >>> Dear Philip and Lars, >>> >>> I agree with Lars. >>> >>> I suggest to take a look (and follow) IETF BCP 47 in the examples >> >> +1. >> >>> , where: >>> >>> * a language code is never in upper-case but in lower-case, >> >> both would be fine according to BCP 47 - it is case insensitive. >> >>> * a country code is always in upper-case and respects ISO-3166-1 >> >> see above. > > ok, but in the ISO lists, language codes are always lower-case and > country codes are always upper-case. Sure, I was just mentioning what is likely to be checked by an BCP 47 validator (I assume a lemon implementation would use existing code, see e.g. http://www.langtag.net/ ), since what you cite below says "...is RECOMMENDED" (= implementers are free to follow the recommendation or not). - Felix > > And in http://tools.ietf.org/search/bcp47, section 2.1.1 > > The ABNF syntax also does not distinguish between upper- and > lowercase: the uppercase US-ASCII letters in the range 'A' through > 'Z' are always considered equivalent and mapped directly to their US- > ASCII lowercase equivalents in the range 'a' through 'z'. So the tag > "I-AMI" is considered equivalent to that value "i-ami" in the > 'irregular' production. > > Although case distinctions do not carry meaning in language tags, > consistent formatting and presentation of language tags will aid > users. The format of subtags in the registry is RECOMMENDED as the > form to use in language tags. This format generally corresponds to > the common conventions for the various ISO standards from which the > subtags are derived. > > These conventions include: > > o [ISO639-1 <http://tools.ietf.org/search/bcp47#ref-ISO639-1>] recommends that language codes be written in lowercase > ('mn' Mongolian). > > o [ISO15924 <http://tools.ietf.org/search/bcp47#ref-ISO15924>] recommends that script codes use lowercase with the > initial letter capitalized ('Cyrl' Cyrillic). > > o [ISO3166-1 <http://tools.ietf.org/search/bcp47#ref-ISO3166-1>] recommends that country codes be capitalized ('MN' > Mongolia). > > > >> >>> * this is to allow combination like eng (when any detail is not >>> needed) but permits precisions like eng-US or eng-UK. >> >> eng-US is not a bcp 47 language tag, since bcp47 requires the use of >> a two letter code if available , see >> http://tools.ietf.org/html/bcp47#section-2.2.1 >> " When languages have both an ISO 639-1 two-character code and a three- >> character code (assigned by ISO 639-2, ISO 639-3, or ISO 639-5), only >> the ISO 639-1 two-character code is defined in the IANA registry." > > You are right. > Gil > >> >> - Felix >> >>> * to follow ISO-639-3 to access to a larger range of values than >>> ISO-639-1 >>> * IMHO nobody follow ISO-639-2 nowadays (it was a sort of wrong trial) >>> * ISO-639-6 is not used >>> >>> Hoping that helps, >>> Gil >>> >>> >>> Le 30/01/2014 08:44, Lars Borin a écrit : >>>> Dear all, >>>> >>>>> >>>>> >>>>> Other that that I wanted to clarify one issue regarding >>>>> language codes in the example. >>>>> >>>>> I have seen that some people (John?) have started to use the >>>>> ISO 639-2 codes (e.g. "ENG" for English, "SPA" for Spanish etc.). >>>>> I would propose we stick to the ISO 639-1 two-letter ISO 639-1 >>>>> codes (e.g. "EN", "ES") etc. There is no particular reason for >>>>> this other than the fact that most people know these codes. >>>>> >>>>> If the argument is recency and reusing the newest standard, >>>>> then we would have to go anyway for four letter codes >>>>> according to ISO 639-6. >>>>> >>>>> >>>>> In the open mulitlingual wordnet we use the three letter codes >>>>> because there are people working on languages which do not have >>>>> two letter codes, such as Abui (abz), Minangkabau (min) or >>>>> Cantonese (yue). Note that some of these are large language >>>>> communities, Minangkabauhas around 6 million speakers. I think >>>>> this is a strong argument for not going back to the two letter codes. >>>> >>>> I suspect that the three-letter codes in question are intended to >>>> be ISO 639-3 (and not 639-2), the use of which is pretty much best >>>> practice in linguistics today (even if there is quite a bit of >>>> discussion about how well it reflects lingusitic descriptive >>>> practice and actual reality; see, e.g., >>>> <http://dlc.hypotheses.org/610>), because of coverage (not even all >>>> the languages of Europe are covered by 639-1, e.g. the two Sorbian >>>> languages) and because of granularity: The "language" level of ISO >>>> 639-3 (basically that of the Ethnologue) will not be included in >>>> 639-6, so there won't be a way of saying "English", since 639-3 >>>> already provides one, but you will be able to say (or, rather, >>>> propose codes for), e.g., "Elizabethan English", "Modern Australian >>>> English", etc. >>>> >>>> Best >>>> Lars >>>> >>>> -- >>>> «Null hull,» sa Harry | – Bögga? sagði Erlendur. Er það orð? | >>>> (Jo Nesbø: Kakerlakkene) | (Arnaldur Indriðason: Mýrin) | >>>> -- >>>> Se aikainen matohan nokitaan! >>>> (Reijo Mäki: Uhkapelimerkki) >>>> ---- >>>> Lars Borin >>>> Språkbanken • Centre for Language Technology >>>> Institutionen för svenska språket >>>> Göteborgs universitet >>>> Box 200 >>>> SE-405 30 Göteborg >>>> Sweden >>>> >>>> office +46 (0)31 786 4544 >>>> mobile +46 (0)70 747 8386 >>>> >>>> <http://språkbanken.gu.se/personal/lars/> >>> >> >
Received on Thursday, 30 January 2014 08:46:32 UTC