Re: lime:language vs. dct:language in OntoLex from Fahad Khan on 2023-06-12 (public-ontolex@w3.org from June 2023)

From: Fahad Khan <fahad.khan@ilc.cnr.it>
Date: Mon, 12 Jun 2023 12:37:12 +0200
To: Penny Labropoulou <penny@athenarc.gr>
Cc: Christian Chiarcos <christian.chiarcos@gmail.com>, public-ontolex <public-ontolex@w3.org>
Message-ID: <CAK+N+9iKKp6Aw9PPdvRsmtZG2+pZDeyJVPkDj=GDr3uFPZY78A@mail.gmail.com>
Dear Penny, all,
Your comments would make an excellent basis for a set of guidelines/best
practices covering different aspects of the creation of multilingual
resources in OntoLex. Indeed given the general reluctance to publish new
versions or updates of OntoLex/the W3C OntoLex report (something I don't
personally agree with :P), a lot of practical details on how to use
OntoLex/lexicog to create lexicons which aren't covered in the report or
are ambiguously (or erroneously) covered might be laid out in a series of
Ontolex-specific guidelines.
Cheers
Fahad

Il giorno dom 11 giu 2023 alle ore 14:15 Penny Labropoulou <
penny@athenarc.gr> ha scritto:

> Dear all,
>
> +1 for Christian's suggestion to use dct:language in the diagram.
>
> Triggered by Gille's comment (https://github.com/ontolex/ontolex/issues/37),
> however, I would like to raise another issue with regard to the recommended
> Range of the dct:language property. It's been discussed before in various
> contexts, yet I don't know if there's a final outcome.
>
> Currently we have the following recommendations:
>
>    - dct:language recommended practice (as Gille correctly mentions) is
>    to use either a non-literal value representing a language from a controlled
>    vocabulary such as ISO 639-2 or ISO 639-3, or a literal value consisting of
>    an IETF Best Current Practice 47 [IETF-BCP47
>    <https://tools.ietf.org/html/bcp47>] language tag.
>    - ontolex recommends for the range of dct:language either Lexvo.org
>    <http://www.lexvo.org/> or The Library of Congress Vocabulary
>    <http://id.loc.gov/vocabulary/iso639-1.html>
>    - DCAT (https://www.w3.org/TR/vocab-dcat-3/#Property:resource_language)
>    recommends: "Resources defined by the Library of Congress (ISO 639-1
>    <http://id.loc.gov/vocabulary/iso639-1.html>, ISO 639-2
>    <http://id.loc.gov/vocabulary/iso639-2.html>) *SHOULD* be used." Yet,
>    there's a note:
>       - *"Requirements for identification of natural language in linked
>       data specifications are evolving. Many applications use [BCP47
>       <https://www.w3.org/TR/vocab-dcat-3/#bib-bcp47>] language tags for this
>       purpose. ISO 639 also provides additional codes in ISO 639-3 which might be
>       required for some uses."*
>    - DCAT-AP (
>    https://joinup.ec.europa.eu/collection/semantic-interoperability-community-semic/solution/dcat-application-profile-data-portals-europe, the
>    application profile for European Data portals, which is also very popular)
>    requires the use of the "EU Vocabularies Languages Named Authority
>    List" (http://publications.europa.eu/resource/authority/language).
>    - Many Linked Data vocabularies (like dbnary as Gilles points out) use
>    the lexvo ontology (recommended by ontolex), but there's also a trending
>    use of the glottolog codes (https://glottolog.org/) and there are also
>    wikidata values for languages.
>
> As we all know, ISO 639 does not cater for all "linguistic systems"
> (languages, dialects, regional varieties, etc.). For instance, in the
> context of the European Language Grid (ELG,
> https://live.european-language-grid.eu/) and the European Language
> Equality (ELE, https://european-language-equality.eu/) we had to describe
> resources in languages/dialects that are not covered by ISO 639. For these,
> although in the ELG catalogue we initially used the BCP47 tags, we decided
> to include also the glottolog codes and an additional free text value for
> cases we could not map to either glottolog or ISO 639 (e.g. Old Balkan
> (Centum) languages).
>
> As a community focusing on "language(s)", I think we should at least
> recommend a more detailed vocabulary for languages. BCP47 is already better
> than just ISO 639 (and extending on it). From ontologies/controlled
> vocabularies, to the best of my knowledge, glottolog has the broader
> coverage and, where possible, includes mappings to ISO 639. If anyone else
> knows of another one, please feel free to add. In addition, if we can
> influence the enrichment of ISO 639 codes, that would be even better.
>
> Apologies if this is not the place to bring this issue.
>
> Best,
> Penny
>
> ------------------------------
> *From:* Fahad Khan <fahad.khan@ilc.cnr.it>
> *Sent:* Saturday, June 10, 2023 20:24
> *To:* Christian Chiarcos <christian.chiarcos@gmail.com>
> *Cc:* public-ontolex <public-ontolex@w3.org>
> *Subject:* Re: lime:language vs. dct:language in OntoLex
>
> Dear all,
> I don't know if we're supposed to respond here or on github but I
> definitely agree with Christian's least invasive proposal of using
> dct:language (with the intention of resolving the dct:language,
> lime:language ambiguity of the whole document in later versions). In
> addition one of the examples which uses dct:language in the report (the
> bank example in Section 3.3) has a slight error, it uses the namespace odct
> instead of dct.
> Cheers
> Fahad
>
> Il giorno sab 10 giu 2023 alle ore 02:41 Christian Chiarcos <
> christian.chiarcos@gmail.com> ha scritto:
>
> Dear OntoLex community,
>
> in a discussion with Manuel Fiorelli, we recently spotted an issue with
> the core diagram, in that it seems to suggest a property "ontolex:language"
> where the text uses "dct:language" (core section) and
> "lime:language"/"dct:language" (lime section), instead.
>
> Details under https://github.com/ontolex/ontolex/issues/37.
>
> The least invasive fix is to replace "language" in the diagram with the
> correct properties. My preference on that is to use "dct:language" (as in
> the examples in core section). Alternatively, we might give both
> "dct:language" and "lime:language" (I'd find that confusing for a
> first-time user) or "lime:language" only (this contradicts examples in core
> ... unless these are fixed).
>
> As there are three options, people in the community might want to discuss.
> My preference is to use `dct:language` here, only, because it doesn't
> require other changes in the text and doesn't confuse first-time users.
>
> Best,
> Christian
>
>
Received on Monday, 12 June 2023 10:37:30 UTC