Re: Modelling relationship between lexical entries and spatial regions from Nicola Carboni on 2019-05-09 (public-ontolex@w3.org from May 2019)

From: Nicola Carboni <nicola.carboni@uzh.ch>
Date: Thu, 9 May 2019 08:54:45 +0200
To: Julia Bosque Gil <jbosque@fi.upm.es>, Frances Gillis-Webber <fran@fynbosch.com>
Cc: public-ontolex@w3.org
Message-Id: <F689964E-0042-4276-BBF2-58192131520B@uzh.ch>
> On 7 May 2019, at 15:52, Julia Bosque Gil <jbosque@fi.upm.es> wrote:
> 
> 
> You can include information about a specific region, script or variant using language tags and subtags [1, 2, 3] with the ontolex:writtenRep. In that respect, the OntoLex Spec provides more links in this part: 
> 
> Furthermore, we require that instances of the model adhere to the RDF 1.1 specification <http://www.w3.org/TR/rdf11-concepts/> and follow the appropriate guidelines. In particular, we require that language tags adhere to Best Common Practice 47 <http://www.rfc-editor.org/rfc/bcp/bcp47.txt>, where tags are made up of a language code (based on ISO 639 codes part 1, 2, 3 or 5 <http://www.iso.org/iso/home/standards/language_codes.htm>), optionally followed by a hyphen and a ISO 3166-1 <http://www.iso.org/iso/iso-3166-1_decoding_table.html> country code. Language tags may also contain further subtags <https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry> expressing e.g. the region, script or further variants.
> 
> In the Lexicography page [4] (Issue 5) we discussed that ontolex:usage could be applied to cases in which you want to specify that a sense of an entry is attested in (only) a particular region. Another option is the use of lexvo:usedIn (with range  lexvo:GeographicRegion) [5], but the property is described as The property of a language or writing system [emphasis added] being used somewhat extensively in a particular geographical region at some point in time (although the domain is not restricted). In our work with K Dictionaries in 2016 [6] we decided to create a custom property kd:geographicalUsage for this, but I would say that opting for the language tag option, whenever possible, would be preferable. 
> 
> Hope this helps :)
> 
It definitively help! Thank you Julia. 
While language tags are quite useful, and I will employ them, they are (a normal limit for a finite list ) a bit limiting in respect to very small community of speakers. Having to work on a model that covers also such cases, I was looking to ground the information in respect to geographical regions. I was not aware of the lexvo ontology, so thank you very much for it. It seems to resolve partially my problem (however it does imply the declaration of a n instance of a language for using the property, which is not fully what I want). 
On another note, I searched a bit for the KD vocabulary extension you mentioned in the article but I could not find any links. Do you have one?


> On 7 May 2019, at 17:23, Frances Gillis-Webber <fran@fynbosch.com> wrote:
> 
> Hi Nicola
> 
> My colleagues and I have approached it in two different ways:
> 
> (1) Encoding the spatial data in the language tag, when language-tagging a string literal
> 
> Each latlon coordinate can be converted to a geohash (can be low precision), and then the region can be represented as a polygon. For a polygon, the first and last coordinate is the same, so we have excluded the last coordinate from the string, separating each geohash with a "--". This string can be included in the privateuse portion of the language tag.
> 
> We have described the solution in detail in the paper [1].
> 
> (2) Modelling the language data
> 
> In the Ontolex-Lemon specification, a language is modelled as follows:
> 
> <lexical entry> dct:language <to a language code URI>
> 
> However, in place of the language code URI, you could use your own URI, and then model the geographic data from there.
> 
> We have created a lightweight ontology for language annotation called MoLA, and have accounted for both custom language tags and regions in the model. The solution is described in [2].
> 
> The ontology is here: http://ontology.londisizwe.org/mola <http://ontology.londisizwe.org/mola>  
> 
> I'm currently working on the specification so I can supply proposed modelling using MoLA, if you like?

Hi Frances this is definitively interesting. Would you mind sending me the article about approach 1? I would like to use wkt, but your solutions seems pretty straightforward and very useful at an application level.

Regarding solution 2  it seems to work almost perfectly for me because I can use to describe language information in time and space (thank you for the link with wgs84! :-) ), and it does describe the several layers of variances I need. Thank you for it, it seems pretty straightforward, so no need for the documentation! 


Best,

Nicola

> On 7 May 2019, at 17:23, Frances Gillis-Webber <fran@fynbosch.com> wrote:
> 
> Hi Nicola
> 
> My colleagues and I have approached it in two different ways:
> 
> (1) Encoding the spatial data in the language tag, when language-tagging a string literal
> 
> Each latlon coordinate can be converted to a geohash (can be low precision), and then the region can be represented as a polygon. For a polygon, the first and last coordinate is the same, so we have excluded the last coordinate from the string, separating each geohash with a "--". This string can be included in the privateuse portion of the language tag.
> 
> We have described the solution in detail in the paper [1].
> 
> (2) Modelling the language data
> 
> In the Ontolex-Lemon specification, a language is modelled as follows:
> 
> <lexical entry> dct:language <to a language code URI>
> 
> However, in place of the language code URI, you could use your own URI, and then model the geographic data from there.
> 
> We have created a lightweight ontology for language annotation called MoLA, and have accounted for both custom language tags and regions in the model. The solution is described in [2].
> 
> The ontology is here: http://ontology.londisizwe.org/mola <http://ontology.londisizwe.org/mola>  
> 
> I'm currently working on the specification so I can supply proposed modelling using MoLA, if you like?
> 
> Kind regards,
> Frances
> 
> [1] Accepted at LDK 2019: The Shortcomings of Language Tags for Linked Data when Modeling Lesser-Known Languages (F. Gillis-Webber & S. Tittel) (I can send you a PDF)
> [2] Accepted at KGSWC 2019: A Model for Language Annotations on the Web (F. Gillis-Webber, S. Tittel and C.M. Keet). PDF: http://www.meteck.org/files/KGSWC19mola.pdf <http://www.meteck.org/files/KGSWC19mola.pdf> 
> 
> 
> 
> On Tue, 7 May 2019 at 15:54, Nicola Carboni <nicola.carboni@uzh.ch <mailto:nicola.carboni@uzh.ch>> wrote:
> Dear ontolex community,
> 
> I am currently modelling some data using ontolex and lexinfo. 
> However, I have some doubts on how to relate a a lexical entry to a specific spatial area. My intent is to declare that an entry is being used in a specific region in a country or in limited spatial area (a valley, or in a small island for example).
> 
> I was wondering if anyone had faced the same challenge and which are the adopted solutions to the problem.
> 
> Best,
> 
> Nicola
> 
> 
> 
> 
> --
> Nicola Carboni
> Research Fellow
> University of Zurich Post Box 23 
> Ramistrasse 71 8006 Zurich 
> Switzerland
> 
> 
> 
>
Received on Thursday, 9 May 2019 06:55:20 UTC