RE: Modelling relationship between lexical entries and spatial regions from Ilan Kernerman on 2019-05-09 (public-ontolex@w3.org from May 2019)

From: Ilan Kernerman <ilan@kdictionaries.com>
Date: Thu, 9 May 2019 11:29:03 +0000
To: Nicola Carboni <nicola.carboni@uzh.ch>, Julia Bosque Gil <jbosque@fi.upm.es>, Frances Gillis-Webber <fran@fynbosch.com>
CC: "public-ontolex@w3.org" <public-ontolex@w3.org>
Message-ID: <AM0PR03MB5074C1DD5B34061E373A6455CA330@AM0PR03MB5074.eurprd03.prod.outlook.com>
Hi Nicola

>>On another note, I searched a bit for the KD vocabulary extension you mentioned in the article but I could not find any links. Do you have one?

The Globalex 2016 proceedings, including the article mentioned, are available here:
http://ailab.ijs.si/globalex/files/2016/06/LREC2016Workshop-GLOBALEX_Proceedings-v2.pdf

Best
Ilan

From: Nicola Carboni [mailto:nicola.carboni@uzh.ch]
Sent: Thursday, May 09, 2019 9:55 AM
To: Julia Bosque Gil <jbosque@fi.upm.es>; Frances Gillis-Webber <fran@fynbosch.com>
Cc: public-ontolex@w3.org
Subject: Re: Modelling relationship between lexical entries and spatial regions

On 7 May 2019, at 15:52, Julia Bosque Gil <jbosque@fi.upm.es<mailto:jbosque@fi.upm.es>> wrote:

You can include information about a specific region, script or variant using language tags and subtags [1, 2, 3] with the ontolex:writtenRep. In that respect, the OntoLex Spec provides more links in this part:
Furthermore, we require that instances of the model adhere to the RDF 1.1 specification<http://www.w3.org/TR/rdf11-concepts/> and follow the appropriate guidelines. In particular, we require that language tags adhere to Best Common Practice 47<http://www.rfc-editor.org/rfc/bcp/bcp47.txt>, where tags are made up of a language code (based on ISO 639 codes part 1, 2, 3 or 5<http://www.iso.org/iso/home/standards/language_codes.htm>), optionally followed by a hyphen and a ISO 3166-1<http://www.iso.org/iso/iso-3166-1_decoding_table.html> country code. Language tags may also contain further subtags<https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry> expressing e.g. the region, script or further variants.
In the Lexicography page [4] (Issue 5) we discussed that ontolex:usage could be applied to cases in which you want to specify that a sense of an entry is attested in (only) a particular region. Another option is the use of lexvo:usedIn (with range  lexvo:GeographicRegion) [5], but the property is described as The property of a language or writing system [emphasis added] being used somewhat extensively in a particular geographical region at some point in time (although the domain is not restricted). In our work with K Dictionaries in 2016 [6] we decided to create a custom property kd:geographicalUsage for this, but I would say that opting for the language tag option, whenever possible, would be preferable.
Hope this helps :)
It definitively help! Thank you Julia.
While language tags are quite useful, and I will employ them, they are (a normal limit for a finite list ) a bit limiting in respect to very small community of speakers. Having to work on a model that covers also such cases, I was looking to ground the information in respect to geographical regions. I was not aware of the lexvo ontology, so thank you very much for it. It seems to resolve partially my problem (however it does imply the declaration of a n instance of a language for using the property, which is not fully what I want).
On another note, I searched a bit for the KD vocabulary extension you mentioned in the article but I could not find any links. Do you have one?



On 7 May 2019, at 17:23, Frances Gillis-Webber <fran@fynbosch.com<mailto:fran@fynbosch.com>> wrote:

Hi Nicola

My colleagues and I have approached it in two different ways:

(1) Encoding the spatial data in the language tag, when language-tagging a string literal

Each latlon coordinate can be converted to a geohash (can be low precision), and then the region can be represented as a polygon. For a polygon, the first and last coordinate is the same, so we have excluded the last coordinate from the string, separating each geohash with a "--". This string can be included in the privateuse portion of the language tag.

We have described the solution in detail in the paper [1].

(2) Modelling the language data

In the Ontolex-Lemon specification, a language is modelled as follows:

<lexical entry> dct:language <to a language code URI>

However, in place of the language code URI, you could use your own URI, and then model the geographic data from there.

We have created a lightweight ontology for language annotation called MoLA, and have accounted for both custom language tags and regions in the model. The solution is described in [2].

The ontology is here: http://ontology.londisizwe.org/mola

I'm currently working on the specification so I can supply proposed modelling using MoLA, if you like?

Hi Frances this is definitively interesting. Would you mind sending me the article about approach 1? I would like to use wkt, but your solutions seems pretty straightforward and very useful at an application level.

Regarding solution 2  it seems to work almost perfectly for me because I can use to describe language information in time and space (thank you for the link with wgs84! :-) ), and it does describe the several layers of variances I need. Thank you for it, it seems pretty straightforward, so no need for the documentation!


Best,

Nicola


On 7 May 2019, at 17:23, Frances Gillis-Webber <fran@fynbosch.com<mailto:fran@fynbosch.com>> wrote:

Hi Nicola

My colleagues and I have approached it in two different ways:

(1) Encoding the spatial data in the language tag, when language-tagging a string literal

Each latlon coordinate can be converted to a geohash (can be low precision), and then the region can be represented as a polygon. For a polygon, the first and last coordinate is the same, so we have excluded the last coordinate from the string, separating each geohash with a "--". This string can be included in the privateuse portion of the language tag.

We have described the solution in detail in the paper [1].

(2) Modelling the language data

In the Ontolex-Lemon specification, a language is modelled as follows:

<lexical entry> dct:language <to a language code URI>

However, in place of the language code URI, you could use your own URI, and then model the geographic data from there.

We have created a lightweight ontology for language annotation called MoLA, and have accounted for both custom language tags and regions in the model. The solution is described in [2].

The ontology is here: http://ontology.londisizwe.org/mola

I'm currently working on the specification so I can supply proposed modelling using MoLA, if you like?

Kind regards,
Frances

[1] Accepted at LDK 2019: The Shortcomings of Language Tags for Linked Data when Modeling Lesser-Known Languages (F. Gillis-Webber & S. Tittel) (I can send you a PDF)
[2] Accepted at KGSWC 2019: A Model for Language Annotations on the Web (F. Gillis-Webber, S. Tittel and C.M. Keet). PDF: http://www.meteck.org/files/KGSWC19mola.pdf



On Tue, 7 May 2019 at 15:54, Nicola Carboni <nicola.carboni@uzh.ch<mailto:nicola.carboni@uzh.ch>> wrote:
Dear ontolex community,

I am currently modelling some data using ontolex and lexinfo.
However, I have some doubts on how to relate a a lexical entry to a specific spatial area. My intent is to declare that an entry is being used in a specific region in a country or in limited spatial area (a valley, or in a small island for example).

I was wondering if anyone had faced the same challenge and which are the adopted solutions to the problem.

Best,

Nicola




--
Nicola Carboni
Research Fellow
University of Zurich Post Box 23
Ramistrasse 71 8006 Zurich
Switzerland
Received on Thursday, 9 May 2019 11:29:31 UTC