- From: Fahad Khan <fahad.khan@ilc.cnr.it>
- Date: Mon, 12 Jun 2023 12:37:12 +0200
- To: Penny Labropoulou <penny@athenarc.gr>
- Cc: Christian Chiarcos <christian.chiarcos@gmail.com>, public-ontolex <public-ontolex@w3.org>
- Message-ID: <CAK+N+9iKKp6Aw9PPdvRsmtZG2+pZDeyJVPkDj=GDr3uFPZY78A@mail.gmail.com>
Dear Penny, all, Your comments would make an excellent basis for a set of guidelines/best practices covering different aspects of the creation of multilingual resources in OntoLex. Indeed given the general reluctance to publish new versions or updates of OntoLex/the W3C OntoLex report (something I don't personally agree with :P), a lot of practical details on how to use OntoLex/lexicog to create lexicons which aren't covered in the report or are ambiguously (or erroneously) covered might be laid out in a series of Ontolex-specific guidelines. Cheers Fahad Il giorno dom 11 giu 2023 alle ore 14:15 Penny Labropoulou < penny@athenarc.gr> ha scritto: > Dear all, > > +1 for Christian's suggestion to use dct:language in the diagram. > > Triggered by Gille's comment (https://github.com/ontolex/ontolex/issues/37), > however, I would like to raise another issue with regard to the recommended > Range of the dct:language property. It's been discussed before in various > contexts, yet I don't know if there's a final outcome. > > Currently we have the following recommendations: > > - dct:language recommended practice (as Gille correctly mentions) is > to use either a non-literal value representing a language from a controlled > vocabulary such as ISO 639-2 or ISO 639-3, or a literal value consisting of > an IETF Best Current Practice 47 [IETF-BCP47 > <https://tools.ietf.org/html/bcp47>] language tag. > - ontolex recommends for the range of dct:language either Lexvo.org > <http://www.lexvo.org/> or The Library of Congress Vocabulary > <http://id.loc.gov/vocabulary/iso639-1.html> > - DCAT (https://www.w3.org/TR/vocab-dcat-3/#Property:resource_language) > recommends: "Resources defined by the Library of Congress (ISO 639-1 > <http://id.loc.gov/vocabulary/iso639-1.html>, ISO 639-2 > <http://id.loc.gov/vocabulary/iso639-2.html>) *SHOULD* be used." Yet, > there's a note: > - *"Requirements for identification of natural language in linked > data specifications are evolving. Many applications use [BCP47 > <https://www.w3.org/TR/vocab-dcat-3/#bib-bcp47>] language tags for this > purpose. ISO 639 also provides additional codes in ISO 639-3 which might be > required for some uses."* > - DCAT-AP ( > https://joinup.ec.europa.eu/collection/semantic-interoperability-community-semic/solution/dcat-application-profile-data-portals-europe, the > application profile for European Data portals, which is also very popular) > requires the use of the "EU Vocabularies Languages Named Authority > List" (http://publications.europa.eu/resource/authority/language). > - Many Linked Data vocabularies (like dbnary as Gilles points out) use > the lexvo ontology (recommended by ontolex), but there's also a trending > use of the glottolog codes (https://glottolog.org/) and there are also > wikidata values for languages. > > As we all know, ISO 639 does not cater for all "linguistic systems" > (languages, dialects, regional varieties, etc.). For instance, in the > context of the European Language Grid (ELG, > https://live.european-language-grid.eu/) and the European Language > Equality (ELE, https://european-language-equality.eu/) we had to describe > resources in languages/dialects that are not covered by ISO 639. For these, > although in the ELG catalogue we initially used the BCP47 tags, we decided > to include also the glottolog codes and an additional free text value for > cases we could not map to either glottolog or ISO 639 (e.g. Old Balkan > (Centum) languages). > > As a community focusing on "language(s)", I think we should at least > recommend a more detailed vocabulary for languages. BCP47 is already better > than just ISO 639 (and extending on it). From ontologies/controlled > vocabularies, to the best of my knowledge, glottolog has the broader > coverage and, where possible, includes mappings to ISO 639. If anyone else > knows of another one, please feel free to add. In addition, if we can > influence the enrichment of ISO 639 codes, that would be even better. > > Apologies if this is not the place to bring this issue. > > Best, > Penny > > ------------------------------ > *From:* Fahad Khan <fahad.khan@ilc.cnr.it> > *Sent:* Saturday, June 10, 2023 20:24 > *To:* Christian Chiarcos <christian.chiarcos@gmail.com> > *Cc:* public-ontolex <public-ontolex@w3.org> > *Subject:* Re: lime:language vs. dct:language in OntoLex > > Dear all, > I don't know if we're supposed to respond here or on github but I > definitely agree with Christian's least invasive proposal of using > dct:language (with the intention of resolving the dct:language, > lime:language ambiguity of the whole document in later versions). In > addition one of the examples which uses dct:language in the report (the > bank example in Section 3.3) has a slight error, it uses the namespace odct > instead of dct. > Cheers > Fahad > > Il giorno sab 10 giu 2023 alle ore 02:41 Christian Chiarcos < > christian.chiarcos@gmail.com> ha scritto: > > Dear OntoLex community, > > in a discussion with Manuel Fiorelli, we recently spotted an issue with > the core diagram, in that it seems to suggest a property "ontolex:language" > where the text uses "dct:language" (core section) and > "lime:language"/"dct:language" (lime section), instead. > > Details under https://github.com/ontolex/ontolex/issues/37. > > The least invasive fix is to replace "language" in the diagram with the > correct properties. My preference on that is to use "dct:language" (as in > the examples in core section). Alternatively, we might give both > "dct:language" and "lime:language" (I'd find that confusing for a > first-time user) or "lime:language" only (this contradicts examples in core > ... unless these are fixed). > > As there are three options, people in the community might want to discuss. > My preference is to use `dct:language` here, only, because it doesn't > require other changes in the text and doesn't confuse first-time users. > > Best, > Christian > >
Received on Monday, 12 June 2023 10:37:30 UTC