Re: ISO 639 URIs from Christian Chiarcos on 2020-07-08 (public-ontolex@w3.org from July 2020)

From: Christian Chiarcos <christian.chiarcos@web.de>
Date: Wed, 08 Jul 2020 13:24:26 +0200
To: Gilles Sérasset <Gilles.Serasset@univ-grenoble-alpes.fr>
Cc: open-linguistics <open-linguistics@googlegroups.com>, "Linked Data for Language Technology Community Group" <public-ld4lt@w3.org>, "public-ontolex@w3.org" <public-ontolex@w3.org>
Message-ID: <op.0nfom0i1br5td5@kitaba>

Am .07.2020, 11:46 Uhr, schrieb Gilles Sérasset  
<Gilles.Serasset@univ-grenoble-alpes.fr>:

> Hi Christian, hi all,
>
> Wouldn’t it be nice if the lexvo.org domain was managed by a group of  
> persons from the LLOD area to provide linked data on the languages that  
> would be an aggregation of all the datasets you mentioned, along with  
> all “sameAs” relations ?

Definitely, it might find support in this community (definitely mine), and  
as you describe it, it is not even be a big effort to create that. But the  
question is how to make that sustainable and to keep it alive (maintained  
and funded) in the long run.

> This solution will involve a dedicated team of maintainers (on the long  
> run) and a rather small infrastructure to provide the data (which could  
> be simply served from static files + content negotiation).

I think it would also require some kind of organizational commitment to  
keep it alive on a technical level. This would be one of the strengths of  
IANA or (maybe) SIL. There may be other alternatives to these, though.

> It assumes that the generation of URIs and accompanying data can be made  
> entirely automatically (which may not be the case if there are name  
> clashes among these).

ISO 693 codes should not clash  
(https://www.loc.gov/standards/iso639-2/iso639jac.html).

> It also assumes that the different dataset licences allows for it (which  
> I am unsure regarding SIL…).

The terms of use (https://iso639-3.sil.org/code_tables/download_tables)  
permit commercial and non-commercial use with attribution and without  
modification, but require that "the product, system, or device does not  
provide a means to redistribute the code set."

I am not sure what this means. Clearly lexvo and the datahub ISO tables  
provide a means to reconstruct the full code set, but apparently that  
hasn't been an issue in the last 10 years, also because these are no  
verbatim copies.

> I also think that such an alternate dataset may be necessary for other  
> persons who will need to have more information attached to the language  
> they deal with (e.g. date annotations for Historical languages,  
> geographical (space/time) annotation for all languages, etc.).

Absolutely. Glottolog has been a great step in this direction for minority  
languages, but for historical languages, nothing really is in existence.  
But maybe let's separate the discussions for extending ISO 693 data (which  
is necessary on many dimensions) from the question how to create  
sustainable identifiers. I could imagine existing organizations taking  
care of just providing an RDF view on ISO 639-3 data, but everything  
beyond that probably requires external funding (and of course, this is  
something we can work towards, too).

Best,
Christian

Received on Wednesday, 8 July 2020 11:24:43 UTC