- From: Christian Chiarcos <christian.chiarcos@web.de>
- Date: Wed, 08 Jul 2020 13:24:26 +0200
- To: Gilles Sérasset <Gilles.Serasset@univ-grenoble-alpes.fr>
- Cc: open-linguistics <open-linguistics@googlegroups.com>, "Linked Data for Language Technology Community Group" <public-ld4lt@w3.org>, "public-ontolex@w3.org" <public-ontolex@w3.org>
Am .07.2020, 11:46 Uhr, schrieb Gilles Sérasset <Gilles.Serasset@univ-grenoble-alpes.fr>: > Hi Christian, hi all, > > Wouldn’t it be nice if the lexvo.org domain was managed by a group of > persons from the LLOD area to provide linked data on the languages that > would be an aggregation of all the datasets you mentioned, along with > all “sameAs” relations ? Definitely, it might find support in this community (definitely mine), and as you describe it, it is not even be a big effort to create that. But the question is how to make that sustainable and to keep it alive (maintained and funded) in the long run. > This solution will involve a dedicated team of maintainers (on the long > run) and a rather small infrastructure to provide the data (which could > be simply served from static files + content negotiation). I think it would also require some kind of organizational commitment to keep it alive on a technical level. This would be one of the strengths of IANA or (maybe) SIL. There may be other alternatives to these, though. > It assumes that the generation of URIs and accompanying data can be made > entirely automatically (which may not be the case if there are name > clashes among these). ISO 693 codes should not clash (https://www.loc.gov/standards/iso639-2/iso639jac.html). > It also assumes that the different dataset licences allows for it (which > I am unsure regarding SIL…). The terms of use (https://iso639-3.sil.org/code_tables/download_tables) permit commercial and non-commercial use with attribution and without modification, but require that "the product, system, or device does not provide a means to redistribute the code set." I am not sure what this means. Clearly lexvo and the datahub ISO tables provide a means to reconstruct the full code set, but apparently that hasn't been an issue in the last 10 years, also because these are no verbatim copies. > I also think that such an alternate dataset may be necessary for other > persons who will need to have more information attached to the language > they deal with (e.g. date annotations for Historical languages, > geographical (space/time) annotation for all languages, etc.). Absolutely. Glottolog has been a great step in this direction for minority languages, but for historical languages, nothing really is in existence. But maybe let's separate the discussions for extending ISO 693 data (which is necessary on many dimensions) from the question how to create sustainable identifiers. I could imagine existing organizations taking care of just providing an RDF view on ISO 639-3 data, but everything beyond that probably requires external funding (and of course, this is something we can work towards, too). Best, Christian
Received on Wednesday, 8 July 2020 11:24:43 UTC