- From: Christian Chiarcos <christian.chiarcos@web.de>
- Date: Tue, 07 Jul 2020 18:40:46 +0200
- To: open-linguistics <open-linguistics@googlegroups.com>
- Cc: "Linked Data for Language Technology Community Group" <public-ld4lt@w3.org>, "public-ontolex@w3.org" <public-ontolex@w3.org>
Dear all, for almost a decade, the Linguistic Linked Open Data community has largely relied on http://www.lexvo.org/ for providing LOD-compliant language identifier URIs, esp. with respect to ISO 639-3. Unfortunately, this got a out of sync with the official standard over the years (and when I tried to confirm this impression by checking one of the more recently created language tags, csp [Southern Ping Chinese], I found that lexvo was down). However, even if this is fixed, the synchronization issue will arise again, and as ISO 639 keeps developing (at a slow pace), I was wondering whether we should not consider a general shift from lexvo URIs to those provided by the official registration authorities. For ISO 693-1 and ISO 692-2, this is the Library of Congress, and they provide - a human-readable view: http://id.loc.gov/vocabulary/iso639-2/eng.html, resp. http://id.loc.gov/vocabulary/iso639-1/en.html -- this is actually machine-readable, too: XHTML+RDFa!), - a machine-readable view (e.g., http://id.loc.gov/vocabulary/iso639-1/en.nt, http://id.loc.gov/vocabulary/iso639-2/eng.nt), and - content negotiation (http://id.loc.gov/vocabulary/iso639-2/eng, http://id.loc.gov/vocabulary/iso639-1/en, working at least for application/rdf+xml) The problem here is ISO 693-3. The registration authority is SIL and they provide resolvable URIs, indeed, e.g., http://iso639-3.sil.org/code/eng. However, this is plain XHTML only, nothing machine-readable (in particular not the mapping to the other ISO 639 standards). On the positive side, their URIs seem to be stable, and also to preserve deprecated/retired codes (https://iso639-3.sil.org/code/dud). I'm wondering what people think. Basically, I see four alternatives to Lexvo URIs: - Work with current SIL URIs, even though these do not provide Linked Data. - Approach SIL to provide an RDF dump (if not anything more advanced) in addition to the HTML and TSV editions they currently provide. - Approach IANA about an RDF edition of the BCP47 subtag registry (https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry)? This contains a curated subset of ISO language tags and is supposed to be used in RDF anyway. [This has been suggested before: https://www.w3.org/wiki/Languages_as_RDF_Resources] - Approach the Datahub team to provide an RDF view on their CSV collection of language codes (https://datahub.io/core/language-codes, harvested from LoC and the IANA subtag registry, but regularly updated) What would be your preferences? Any other ideas? In any case, if we're going to reach out to SIL, IANA or Datahub, we should be able to demonstrate that this is a request from a broader community, because it would come with some effort for them. Best, Christian NB: Apologies for sending this to multiple mailing lists, but I think we should work towards a broader consensus for language resources in general here.
Received on Tuesday, 7 July 2020 16:41:14 UTC