About the frac module and the representation of similarities

Dear FRAC workers :-)

In the recent past I re-read the wiki 
(https://acoli-repo.github.io/ontolex-frac/), with a focus on Embeddings.

The text in general is already very good, and I liked the direction 
taken for representing word embeddings attached to a "lexical element" 
(form, concept, sense, entry,....)

Myself I was also interested on how to represent similarities resulting 
from a (pre-trained) word embedding data set. So for example how does 
the form "card" relates to other forms, in term of similarities induced 
by word2vec (or other approaches).

I came up now with 2 distinct suggestions: one using blank nodes, and 
one using instances of a class (in case we want to "reify" such semantic 
similarities). The small experiment is done within the core module of 
OntoLex-Lemon and could be easily be transferred to the frac module, one 
it is operational. But first we need to get to a good and consensual 
modeling!

Below you can find the very preliminary code (please check the two 
suggested was to encode semantic similarity between "card" and "cards", 
with the property hasEmbeddingSimilarityWith (in one case pointing to an 
instance, in the other case pointing to a blank node). We would need 
further to indicate the decreasing order of similarities with other forms.

Well just some food for the next telco on frac.

Thanks!

THierry


# baseURI: http://tutorial-topbraid.com/morphsem
# imports: http://purl.org/dc/elements/1.1/
# imports: http://purl.org/dc/terms/
# imports: http://www.lexinfo.net/ontology/2.0/lexinfo
# imports: http://www.w3.org/2004/02/skos/core
# imports: http://www.w3.org/ns/lemon/decomp
# imports: http://www.w3.org/ns/lemon/ontolex

@prefix : <http://tutorial-topbraid.com/morphsem#> .
@prefix decomp: <http://www.w3.org/ns/lemon/decomp/#> .
@prefix lemon: <http://lemon-model.net/lemon#> .
@prefix lexinfo: <http://www.lexinfo.net/ontology/2.0/lexinfo#> .
@prefix ontolex: <http://www.w3.org/ns/lemon/ontolex#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

lemon:hasSimilarity_value
   rdf:type owl:DatatypeProperty ;
   rdfs:comment "gives a real number as value" ;
   rdfs:domain :Embedding_Similarities ;
   rdfs:label "Similarity value(@en}" ;
   rdfs:range xsd:decimal ;
.
<http://tutorial-topbraid.com/morphsem>
   rdf:type owl:Ontology ;
   owl:imports <http://purl.org/dc/elements/1.1/> ;
   owl:imports <http://purl.org/dc/terms/> ;
   owl:imports <http://www.lexinfo.net/ontology/2.0/lexinfo> ;
   owl:imports <http://www.w3.org/2004/02/skos/core> ;
   owl:imports <http://www.w3.org/ns/lemon/decomp> ;
   owl:imports <http://www.w3.org/ns/lemon/ontolex> ;
   owl:versionInfo "Created with TopBraid Composer" ;
.
:Embedding_Algorithm
   rdf:type owl:Class ;
   rdfs:comment "Listing the distinct approaches" ;
   rdfs:label "Embedding algorithm" ;
.
:Embedding_Algorithm_W2V
   rdf:type :Embedding_Algorithm ;
   rdfs:comment "the Word2Vec algorithm" ;
   rdfs:label "Word2Vec" ;
.
:Embedding_Similarities
   rdf:type owl:Class ;
   rdfs:comment "Represent similarities are given by embedding algorithms" ;
   rdfs:label "Embedding similarities(@en}" ;
.
:Embedding_W2C_1
   rdf:type :Embedding_Similarities ;
   lemon:hasSimilarity_value 0.89 ;
   :ObjectProperty_hasEmbeddingAlgorithm :Embedding_Algorithm_W2V ;
   rdfs:comment "Represent similarities are given by word2vec" ;
   rdfs:label "Word2Vec Embedding similarities(@en}" ;
.
:Form_card
   rdf:type ontolex:Form ;
   :ObjectProperty_hasEmbeddingSimilarityWith :Embedding_W2C_1 ;
   :ObjectProperty_hasEmbeddingSimilarityWith [
       :hasEmbeddingAlgorithm :Embedding_Algorithm_W2V ;
       rdf:value "0.89" ;
       ontolex:canonicalForm :Form_cards ;
       ontolex:writtenRep "cards"@en ;
     ] ;
   lexinfo:number lexinfo:singular ;
   ontolex:writtenRep "card"@en ;
.
:Form_cards
   rdf:type ontolex:Form ;
   lexinfo:number lexinfo:plural ;
   ontolex:writtenRep "cards"@en ;
.
:LexicalSense_card
   rdf:type ontolex:LexicalSense ;
   rdfs:comment "portable physical object used for identification, 
authentication, data storage, or financial transaction (taken from 
wikidata)  {@en} " ;
   ontolex:isSenseOf :card ;
   ontolex:reference <https://www.wikidata.org/wiki/Q42965339> ;
.
:ObjectProperty_hasEmbeddingAlgorithm
   rdf:type owl:ObjectProperty ;
   rdfs:domain :Embedding_Similarities ;
   rdfs:range :Embedding_Algorithm ;
.
:ObjectProperty_hasEmbeddingSimilarityWith
   rdf:type owl:ObjectProperty ;
   rdfs:domain ontolex:Form ;
   rdfs:range :Embedding_Similarities ;
   rdfs:range ontolex:Form ;
.
:card
   rdf:type ontolex:Word ;
   lexinfo:partOfSpeech lexinfo:noun ;
   ontolex:canonicalForm :Form_card ;
   ontolex:otherForm :Form_cards ;
   ontolex:reference <https://www.wikidata.org/wiki/Q1420> ;
   ontolex:sense :LexicalSense_card ;
.


-- 
Thierry Declerck
Senior Consultant at DFKI GmbH, Multilinguality and Language Technology
Stuhlsatzenhausweg, 3
D-66123 Saarbruecken
Phone: +49 681 / 857 75-53 58
Fax: +49 681 / 857 75-53 38
email: declerck@dfki.de

-------------------------------------------------------------
Deutsches Forschungszentrum für Künstliche Intelligenz GmbH
Trippstadter Strasse 122, D-67663 Kaiserslautern, Germany

Geschäftsführung:
Prof. Dr. Jana Koehler (Vorsitzende)
Dr. Walter Olthoff

Vorsitzender des Aufsichtsrats:
Prof. Dr. h.c. Hans A. Aukes
Amtsgericht Kaiserslautern, HRB 2313
-------------------------------------------------------------

Received on Sunday, 25 August 2019 18:10:58 UTC