Re: [open-linguistics] ISO 639 URIs from Robert Forkel on 2020-07-08 (public-ontolex@w3.org from July 2020)

From: Robert Forkel <xrotwang@googlemail.com>
Date: Wed, 8 Jul 2020 07:06:03 +0200
To: Christian Chiarcos <christian.chiarcos@web.de>
Cc: open-linguistics <open-linguistics@googlegroups.com>, Linked Data for Language Technology Community Group <public-ld4lt@w3.org>, "public-ontolex@w3.org" <public-ontolex@w3.org>
Message-ID: <CAJhx5RerEYoWTCTDsT7QtQmj50f42C08d14BsonMaDp1sF+P0g@mail.gmail.com>

A note on the downloadable data provided by SIL for the ISO-639-3
codes: For quite some time one of the tables in the zip files provided
at https://iso639-3.sil.org/code_tables/639/data was broken (contained
lines with inconsistent numbers of tabs but no content - which is the
reason for this line
https://github.com/clld/clldutils/blob/93d3789175103d6f60eb33ef7f4779177ec9993f/src/clldutils/iso_639_3.py#L52
in my processing code). I notified SIL about this but never got an
answer. Given this, I wouldn't have too high hopes in an RDF dump
provided by SIL.

On Tue, Jul 7, 2020 at 7:19 PM Robert Forkel <xrotwang@googlemail.com> wrote:
>
> Just wanted to mention that the URLs of the form
> http://iso639-3.sil.org/code/eng are also a fairly recent development,
> and - as far as I know - did not come with any commitment of SIL to
> keep these stable. But then, they probably carry enough semantics to
> serve as a human-resolvable identifier even if they don't resolve for
> machines anymore.
>
> best
> robert
>
> On Tue, Jul 7, 2020 at 6:40 PM Christian Chiarcos
> <christian.chiarcos@web.de> wrote:
> >
> > Dear all,
> >
> > for almost a decade, the Linguistic Linked Open Data community has largely
> > relied on http://www.lexvo.org/ for providing LOD-compliant language
> > identifier URIs, esp. with respect to ISO 639-3. Unfortunately, this got a
> > out of sync with the official standard over the years (and when I tried to
> > confirm this impression by checking one of the more recently created
> > language tags, csp [Southern Ping Chinese], I found that lexvo was down).
> >
> > However, even if this is fixed, the synchronization issue will arise
> > again, and as ISO 639 keeps developing (at a slow pace), I was wondering
> > whether we should not consider a general shift from lexvo URIs to those
> > provided by the official registration authorities.
> >
> > For ISO 693-1 and ISO 692-2, this is the Library of Congress, and they
> > provide
> > - a human-readable view: http://id.loc.gov/vocabulary/iso639-2/eng.html,
> > resp. http://id.loc.gov/vocabulary/iso639-1/en.html -- this is actually
> > machine-readable, too: XHTML+RDFa!),
> > - a machine-readable view (e.g.,
> > http://id.loc.gov/vocabulary/iso639-1/en.nt,
> > http://id.loc.gov/vocabulary/iso639-2/eng.nt), and
> > - content negotiation (http://id.loc.gov/vocabulary/iso639-2/eng,
> > http://id.loc.gov/vocabulary/iso639-1/en, working at least for
> > application/rdf+xml)
> >
> > The problem here is ISO 693-3. The registration authority is SIL and they
> > provide resolvable URIs, indeed, e.g., http://iso639-3.sil.org/code/eng.
> > However, this is plain XHTML only, nothing machine-readable (in particular
> > not the mapping to the other ISO 639 standards). On the positive side,
> > their URIs seem to be stable, and also to preserve deprecated/retired
> > codes (https://iso639-3.sil.org/code/dud).
> >
> > I'm wondering what people think. Basically, I see four alternatives to
> > Lexvo URIs:
> > - Work with current SIL URIs, even though these do not provide Linked Data.
> > - Approach SIL to provide an RDF dump (if not anything more advanced) in
> > addition to the HTML and TSV editions they currently provide.
> > - Approach IANA about an RDF edition of the BCP47 subtag registry
> > (https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry)?
> > This contains a curated subset of ISO language tags and is supposed to be
> > used in RDF anyway. [This has been suggested before:
> > https://www.w3.org/wiki/Languages_as_RDF_Resources]
> > - Approach the Datahub team to provide an RDF view on their CSV collection
> > of language codes (https://datahub.io/core/language-codes, harvested from
> > LoC and the IANA subtag registry, but regularly updated)
> >
> > What would be your preferences? Any other ideas? In any case, if we're
> > going to reach out to SIL, IANA or Datahub, we should be able to
> > demonstrate that this is a request from a broader community, because it
> > would come with some effort for them.
> >
> > Best,
> > Christian
> >
> > NB: Apologies for sending this to multiple mailing lists, but I think we
> > should work towards a broader consensus for language resources in general
> > here.
> >
> > --
> > You received this message because you are subscribed to the Google Groups "open-linguistics" group.
> > To unsubscribe from this group and stop receiving emails from it, send an email to open-linguistics+unsubscribe@googlegroups.com.
> > To view this discussion on the web, visit https://groups.google.com/d/msgid/open-linguistics/op.0nd8mcm1br5td5%40kitaba.

Received on Wednesday, 8 July 2020 05:06:27 UTC