W3C home > Mailing lists > Public > public-ld4lt@w3.org > July 2020

Re: ISO 639 URIs

From: Felix Sasaki <felix@sasakiatcf.com>
Date: Wed, 8 Jul 2020 10:47:40 +0200
Message-ID: <CAL58czodXAjvMtoCsaWgc6WwMfs29S8994hUQ7hxHOpijcBLpQ@mail.gmail.com>
To: Christian Chiarcos <christian.chiarcos@web.de>
Cc: open-linguistics <open-linguistics@googlegroups.com>, Linked Data for Language Technology Community Group <public-ld4lt@w3.org>, "public-ontolex@w3.org" <public-ontolex@w3.org>
Dear Christian and all,

my preference would be "- Approach IANA about an RDF edition of the BCP47
subtag registry ".

Btw., since we had a mail exchange about the topic a while ago, there has
been a discussion in the W3C i18n working group
https://www.w3.org/2020/04/09-i18n-minutes.html#item06

At the moment that group is working on guidance about language tags and
locale identifiers, in which RDF related guidance would fit very well, see
 https://www.w3.org/2020/07/02-i18n-minutes.html#item07

Best,

Felix

On Tue, 7 Jul 2020 at 18:40, Christian Chiarcos <christian.chiarcos@web.de>
wrote:

> Dear all,
>
> for almost a decade, the Linguistic Linked Open Data community has
> largely
> relied on http://www.lexvo.org/ for providing LOD-compliant language
> identifier URIs, esp. with respect to ISO 639-3. Unfortunately, this got
> a
> out of sync with the official standard over the years (and when I tried
> to
> confirm this impression by checking one of the more recently created
> language tags, csp [Southern Ping Chinese], I found that lexvo was down).
>
> However, even if this is fixed, the synchronization issue will arise
> again, and as ISO 639 keeps developing (at a slow pace), I was wondering
> whether we should not consider a general shift from lexvo URIs to those
> provided by the official registration authorities.
>
> For ISO 693-1 and ISO 692-2, this is the Library of Congress, and they
> provide
> - a human-readable view: http://id.loc.gov/vocabulary/iso639-2/eng.html,
> resp. http://id.loc.gov/vocabulary/iso639-1/en.html -- this is actually
> machine-readable, too: XHTML+RDFa!),
> - a machine-readable view (e.g.,
> http://id.loc.gov/vocabulary/iso639-1/en.nt,
> http://id.loc.gov/vocabulary/iso639-2/eng.nt), and
> - content negotiation (http://id.loc.gov/vocabulary/iso639-2/eng,
> http://id.loc.gov/vocabulary/iso639-1/en, working at least for
> application/rdf+xml)
>
> The problem here is ISO 693-3. The registration authority is SIL and they
> provide resolvable URIs, indeed, e.g., http://iso639-3.sil.org/code/eng.
> However, this is plain XHTML only, nothing machine-readable (in
> particular
> not the mapping to the other ISO 639 standards). On the positive side,
> their URIs seem to be stable, and also to preserve deprecated/retired
> codes (https://iso639-3.sil.org/code/dud).
>
> I'm wondering what people think. Basically, I see four alternatives to
> Lexvo URIs:
> - Work with current SIL URIs, even though these do not provide Linked Data.
> - Approach SIL to provide an RDF dump (if not anything more advanced) in
> addition to the HTML and TSV editions they currently provide.
> - Approach IANA about an RDF edition of the BCP47 subtag registry
> (
> https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry)?
>
> This contains a curated subset of ISO language tags and is supposed to be
> used in RDF anyway. [This has been suggested before:
> https://www.w3.org/wiki/Languages_as_RDF_Resources]
> - Approach the Datahub team to provide an RDF view on their CSV
> collection
> of language codes (https://datahub.io/core/language-codes, harvested
> from
> LoC and the IANA subtag registry, but regularly updated)
>
> What would be your preferences? Any other ideas? In any case, if we're
> going to reach out to SIL, IANA or Datahub, we should be able to
> demonstrate that this is a request from a broader community, because it
> would come with some effort for them.
>
> Best,
> Christian
>
> NB: Apologies for sending this to multiple mailing lists, but I think we
> should work towards a broader consensus for language resources in general
> here.
>
>
Received on Wednesday, 8 July 2020 08:48:08 UTC

This archive was generated by hypermail 2.4.0 : Wednesday, 8 July 2020 08:48:09 UTC