W3C home > Mailing lists > Public > public-ontolex@w3.org > August 2020

RE: [open-linguistics] Re: ISO 639 URIs

From: Ronan Power <Ronan@translation.ie>
Date: Fri, 7 Aug 2020 13:30:24 +0000
To: 'Felix Sasaki' <felix@sasakiatcf.com>, Christian Chiarcos <christian.chiarcos@web.de>
CC: "santhosh.thottingal@gmail.com" <santhosh.thottingal@gmail.com>, open-linguistics <open-linguistics@googlegroups.com>, Linked Data for Language Technology Community Group <public-ld4lt@w3.org>, "public-ontolex@w3.org" <public-ontolex@w3.org>
Message-ID: <VI1PR05MB3406D7EAA1FFD511DA267745CF490@VI1PR05MB3406.eurprd05.prod.outlook.com>
Hi, I wrote on this before to the group:
I think it’s important to realise that ISO639-3 does indeed have its problems, not least of which is the “apparent” descriptor<>tag mismatch as do the alternatives and variants, and it is confusing.
I have adopted ISO639-3 previously, however I was forced to adopt a hybrid version including all available language tags from all systems in an application we were building, and, we allowed for different “PROPER-NOUNS” in “any language” to be added as a altTag for any of those languages.
However I think it is important to realise that 639-3 does by far the better job of having the most scope of languages but in my opinion we are dealing with “spoken languages” here, even though many languages are accurately represented as written languages too. Whereas I sense some confusion between this and the differences between similar language mapping widely used in HTML/XML some of the larger multinational localisation vendors and the “localisation industry status quo” in general such as (lang_country) mappings like (ES_MX) or (EN_GB, EN_US) etc. In addition, in my opinion, another principal issue here is the point of view of the culture defining the standard, i.e. a westernised English speaking point of view, which of course is based on text mapping assumptions on ISO text mappings and character sets.  E.g., where does something like “written traditional Chinese vs Simplified Chinese” come into any of the systems referred to above.

This really boils down to the creation and agreement of a source index of identifiers for languages, dialects, written languages and scripts, of which to my knowledge no such system has yet been completed thoroughly.


That’s My 2 cents of opinion.  Please feel free to reach out to me if you feel strongly about this, and I apologise if I have offended anybody.

Kind regrads
Ronan

From: Felix Sasaki [mailto:felix@sasakiatcf.com]
Sent: Friday 7 August 2020 12:32
To: Christian Chiarcos <christian.chiarcos@web.de>
Cc: santhosh.thottingal@gmail.com; open-linguistics <open-linguistics@googlegroups.com>; Linked Data for Language Technology Community Group <public-ld4lt@w3.org>; public-ontolex@w3.org
Subject: Re: [open-linguistics] Re: ISO 639 URIs

Dear Christian and all,

FYI and in case you have further comments, I brought this thread to the attention of the W3C i18n working group, see this issue
https://github.com/w3c/i18n-discuss/issues/13

also, W3C has started work again on a draft about "language tags and locale identifiers", see the editors copy here
https://w3c.github.io/ltli/

that version contains also some guidance about working with language tags in the context of RDF, see
https://w3c.github.io/ltli/#ltli-language-information-in-uris-req


Feel free to provide feedback here or within the W3C GitHub, we'd be more than happy to take this into account.

Best,

Felix

On Wed, 8 Jul 2020 at 14:11, Christian Chiarcos <christian.chiarcos@web.de<mailto:christian.chiarcos@web.de>> wrote:
I think most people would prefer URIs that maintain the ISO acronym, because it is established and because it can still be interpreted if the URLs don't resolve anymore. But the Wikidata entry for Malayalam is interesting for another reason: It refers to the Publications Office of the European Union as the source of its ISO 639 identifiers, and they seem indeed to provide URIs for some form of ISO 639-based language identifiers, e.g., http://publications.europa.eu/resource/authority/language/AVA for ISO 639 ava. However, they seem to be focusing on ISO 639-2 languages only (e.g., they have neither http://publications.europa.eu/resource/authority/language/CSP for ISO 639-3 csp nor http://publications.europa.eu/resource/authority/language/AV<http://publications.europa.eu/resource/authority/language/CSP> for ISO 639-1 av), so this doesn't seem to an appealing alternative either.

Best,
Christian

Am Mi., 8. Juli 2020 um 13:51 Uhr schrieb <santhosh.thottingal@gmail.com<mailto:santhosh.thottingal@gmail.com>>:
 How about Wikidata(https://www.wikidata.org/)? For example https://www.wikidata.org/wiki/Q36236 is for Malayalam and has linked to several identifiers.
--
You received this message because you are subscribed to the Google Groups "open-linguistics" group.
To unsubscribe from this group and stop receiving emails from it, send an email to open-linguistics+unsubscribe@googlegroups.com<mailto:open-linguistics+unsubscribe@googlegroups.com>.
To view this discussion on the web, visit https://groups.google.com/d/msgid/open-linguistics/5104cd19-57d6-47d0-99a5-2616bca01eb1o%40googlegroups.com<https://groups.google.com/d/msgid/open-linguistics/5104cd19-57d6-47d0-99a5-2616bca01eb1o%40googlegroups.com?utm_medium=email&utm_source=footer>.
Received on Friday, 7 August 2020 13:35:18 UTC

This archive was generated by hypermail 2.4.0 : Friday, 7 August 2020 13:35:19 UTC