Re: lime:language vs. dct:language in OntoLex from Penny Labropoulou on 2023-06-15 (public-ontolex@w3.org from June 2023)

From: Penny Labropoulou <penny@athenarc.gr>
Date: Thu, 15 Jun 2023 09:08:22 +0000
To: Fahad Khan <fahad.khan@ilc.cnr.it>
CC: Christian Chiarcos <christian.chiarcos@gmail.com>, public-ontolex <public-ontolex@w3.org>
Message-ID: <VI1PR05MB468576F9A351EAB8E84485BDAB5BA@VI1PR05MB4685.eurprd05.prod.outlook.com>
Dear Fahad, all
thanks for your suggestion. Let's see if we can address this at least in the BPMLOD guidelines where appropriate.
Best,
Penny
________________________________
From: Fahad Khan <fahad.khan@ilc.cnr.it>
Sent: Monday, June 12, 2023 13:37
To: Penny Labropoulou <penny@athenarc.gr>
Cc: Christian Chiarcos <christian.chiarcos@gmail.com>; public-ontolex <public-ontolex@w3.org>
Subject: Re: lime:language vs. dct:language in OntoLex

Dear Penny, all,
Your comments would make an excellent basis for a set of guidelines/best practices covering different aspects of the creation of multilingual resources in OntoLex. Indeed given the general reluctance to publish new versions or updates of OntoLex/the W3C OntoLex report (something I don't personally agree with :P), a lot of practical details on how to use OntoLex/lexicog to create lexicons which aren't covered in the report or are ambiguously (or erroneously) covered might be laid out in a series of Ontolex-specific guidelines.
Cheers
Fahad

Il giorno dom 11 giu 2023 alle ore 14:15 Penny Labropoulou <penny@athenarc.gr<mailto:penny@athenarc.gr>> ha scritto:
Dear all,

+1 for Christian's suggestion to use dct:language in the diagram.

Triggered by Gille's comment (https://github.com/ontolex/ontolex/issues/37), however, I would like to raise another issue with regard to the recommended Range of the dct:language property. It's been discussed before in various contexts, yet I don't know if there's a final outcome.

Currently we have the following recommendations:

  *   dct:language recommended practice (as Gille correctly mentions) is to use either a non-literal value representing a language from a controlled vocabulary such as ISO 639-2 or ISO 639-3, or a literal value consisting of an IETF Best Current Practice 47 [IETF-BCP47<https://tools.ietf.org/html/bcp47>] language tag.
  *   ontolex recommends for the range of dct:language either Lexvo.org<http://www.lexvo.org/> or The Library of Congress Vocabulary<http://id.loc.gov/vocabulary/iso639-1.html>
  *   DCAT (https://www.w3.org/TR/vocab-dcat-3/#Property:resource_language) recommends: "Resources defined by the Library of Congress (ISO 639-1<http://id.loc.gov/vocabulary/iso639-1.html>, ISO 639-2<http://id.loc.gov/vocabulary/iso639-2.html>) SHOULD be used." Yet, there's a note:
     *   "Requirements for identification of natural language in linked data specifications are evolving. Many applications use [BCP47<https://www.w3.org/TR/vocab-dcat-3/#bib-bcp47>] language tags for this purpose. ISO 639 also provides additional codes in ISO 639-3 which might be required for some uses."
  *   DCAT-AP (https://joinup.ec.europa.eu/collection/semantic-interoperability-community-semic/solution/dcat-application-profile-data-portals-europe, the application profile for European Data portals, which is also very popular) requires the use of the "EU Vocabularies Languages Named Authority List" (http://publications.europa.eu/resource/authority/language).
  *   Many Linked Data vocabularies (like dbnary as Gilles points out) use the lexvo ontology (recommended by ontolex), but there's also a trending use of the glottolog codes (https://glottolog.org/) and there are also wikidata values for languages.

As we all know, ISO 639 does not cater for all "linguistic systems" (languages, dialects, regional varieties, etc.). For instance, in the context of the European Language Grid (ELG, https://live.european-language-grid.eu/) and the European Language Equality (ELE, https://european-language-equality.eu/) we had to describe resources in languages/dialects that are not covered by ISO 639. For these, although in the ELG catalogue we initially used the BCP47 tags, we decided to include also the glottolog codes and an additional free text value for cases we could not map to either glottolog or ISO 639 (e.g. Old Balkan (Centum) languages).

As a community focusing on "language(s)", I think we should at least recommend a more detailed vocabulary for languages. BCP47 is already better than just ISO 639 (and extending on it). From ontologies/controlled vocabularies, to the best of my knowledge, glottolog has the broader coverage and, where possible, includes mappings to ISO 639. If anyone else knows of another one, please feel free to add. In addition, if we can influence the enrichment of ISO 639 codes, that would be even better.

Apologies if this is not the place to bring this issue.

Best,
Penny

________________________________
From: Fahad Khan <fahad.khan@ilc.cnr.it<mailto:fahad.khan@ilc.cnr.it>>
Sent: Saturday, June 10, 2023 20:24
To: Christian Chiarcos <christian.chiarcos@gmail.com<mailto:christian.chiarcos@gmail.com>>
Cc: public-ontolex <public-ontolex@w3.org<mailto:public-ontolex@w3.org>>
Subject: Re: lime:language vs. dct:language in OntoLex

Dear all,
I don't know if we're supposed to respond here or on github but I definitely agree with Christian's least invasive proposal of using dct:language (with the intention of resolving the dct:language, lime:language ambiguity of the whole document in later versions). In addition one of the examples which uses dct:language in the report (the bank example in Section 3.3) has a slight error, it uses the namespace odct instead of dct.
Cheers
Fahad

Il giorno sab 10 giu 2023 alle ore 02:41 Christian Chiarcos <christian.chiarcos@gmail.com<mailto:christian.chiarcos@gmail.com>> ha scritto:
Dear OntoLex community,

in a discussion with Manuel Fiorelli, we recently spotted an issue with the core diagram, in that it seems to suggest a property "ontolex:language" where the text uses "dct:language" (core section) and "lime:language"/"dct:language" (lime section), instead.

Details under https://github.com/ontolex/ontolex/issues/37.

The least invasive fix is to replace "language" in the diagram with the correct properties. My preference on that is to use "dct:language" (as in the examples in core section). Alternatively, we might give both "dct:language" and "lime:language" (I'd find that confusing for a first-time user) or "lime:language" only (this contradicts examples in core ... unless these are fixed).

As there are three options, people in the community might want to discuss. My preference is to use `dct:language` here, only, because it doesn't require other changes in the text and doesn't confuse first-time users.

Best,
Christian
Received on Thursday, 15 June 2023 09:08:30 UTC