About the frac module and the representation of similarities

Dear FRAC workers :-)

In the recent past I re-read the wiki 
(https://acoli-repo.github.io/ontolex-frac/), with a focus on Embeddings.

The text in general is already very good, and I liked the direction 
taken for representing word embeddings attached to a "lexical element" 
(form, concept, sense, entry,....)

Myself I was also interested on how to represent similarities resulting 
from a (pre-trained) word embedding data set. So for example how does 
the form "card" relates to other forms, in term of similarities induced 
by word2vec (or other approaches).

I came up now with 2 distinct suggestions: one using blank nodes, and 
one using instances of a class (in case we want to "reify" such semantic 
similarities). The small experiment is done within the core module of 
OntoLex-Lemon and could be easily be transferred to the frac module, one 
it is operational. But first we need to get to a good and consensual 

Below you can find the very preliminary code (please check the two 
suggested was to encode semantic similarity between "card" and "cards", 
with the property hasEmbeddingSimilarityWith (in one case pointing to an 
instance, in the other case pointing to a blank node). We would need 
further to indicate the decreasing order of similarities with other forms.

Well just some food for the next telco on frac.



# baseURI: http://tutorial-topbraid.com/morphsem
# imports: http://purl.org/dc/elements/1.1/
# imports: http://purl.org/dc/terms/
# imports: http://www.lexinfo.net/ontology/2.0/lexinfo
# imports: http://www.w3.org/2004/02/skos/core
# imports: http://www.w3.org/ns/lemon/decomp
# imports: http://www.w3.org/ns/lemon/ontolex

@prefix : <http://tutorial-topbraid.com/morphsem#> .
@prefix decomp: <http://www.w3.org/ns/lemon/decomp/#> .
@prefix lemon: <http://lemon-model.net/lemon#> .
@prefix lexinfo: <http://www.lexinfo.net/ontology/2.0/lexinfo#> .
@prefix ontolex: <http://www.w3.org/ns/lemon/ontolex#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

   rdf:type owl:DatatypeProperty ;
   rdfs:comment "gives a real number as value" ;
   rdfs:domain :Embedding_Similarities ;
   rdfs:label "Similarity value(@en}" ;
   rdfs:range xsd:decimal ;
   rdf:type owl:Ontology ;
   owl:imports <http://purl.org/dc/elements/1.1/> ;
   owl:imports <http://purl.org/dc/terms/> ;
   owl:imports <http://www.lexinfo.net/ontology/2.0/lexinfo> ;
   owl:imports <http://www.w3.org/2004/02/skos/core> ;
   owl:imports <http://www.w3.org/ns/lemon/decomp> ;
   owl:imports <http://www.w3.org/ns/lemon/ontolex> ;
   owl:versionInfo "Created with TopBraid Composer" ;
   rdf:type owl:Class ;
   rdfs:comment "Listing the distinct approaches" ;
   rdfs:label "Embedding algorithm" ;
   rdf:type :Embedding_Algorithm ;
   rdfs:comment "the Word2Vec algorithm" ;
   rdfs:label "Word2Vec" ;
   rdf:type owl:Class ;
   rdfs:comment "Represent similarities are given by embedding algorithms" ;
   rdfs:label "Embedding similarities(@en}" ;
   rdf:type :Embedding_Similarities ;
   lemon:hasSimilarity_value 0.89 ;
   :ObjectProperty_hasEmbeddingAlgorithm :Embedding_Algorithm_W2V ;
   rdfs:comment "Represent similarities are given by word2vec" ;
   rdfs:label "Word2Vec Embedding similarities(@en}" ;
   rdf:type ontolex:Form ;
   :ObjectProperty_hasEmbeddingSimilarityWith :Embedding_W2C_1 ;
   :ObjectProperty_hasEmbeddingSimilarityWith [
       :hasEmbeddingAlgorithm :Embedding_Algorithm_W2V ;
       rdf:value "0.89" ;
       ontolex:canonicalForm :Form_cards ;
       ontolex:writtenRep "cards"@en ;
     ] ;
   lexinfo:number lexinfo:singular ;
   ontolex:writtenRep "card"@en ;
   rdf:type ontolex:Form ;
   lexinfo:number lexinfo:plural ;
   ontolex:writtenRep "cards"@en ;
   rdf:type ontolex:LexicalSense ;
   rdfs:comment "portable physical object used for identification, 
authentication, data storage, or financial transaction (taken from 
wikidata)  {@en} " ;
   ontolex:isSenseOf :card ;
   ontolex:reference <https://www.wikidata.org/wiki/Q42965339> ;
   rdf:type owl:ObjectProperty ;
   rdfs:domain :Embedding_Similarities ;
   rdfs:range :Embedding_Algorithm ;
   rdf:type owl:ObjectProperty ;
   rdfs:domain ontolex:Form ;
   rdfs:range :Embedding_Similarities ;
   rdfs:range ontolex:Form ;
   rdf:type ontolex:Word ;
   lexinfo:partOfSpeech lexinfo:noun ;
   ontolex:canonicalForm :Form_card ;
   ontolex:otherForm :Form_cards ;
   ontolex:reference <https://www.wikidata.org/wiki/Q1420> ;
   ontolex:sense :LexicalSense_card ;

Thierry Declerck
Senior Consultant at DFKI GmbH, Multilinguality and Language Technology
Stuhlsatzenhausweg, 3
D-66123 Saarbruecken
Phone: +49 681 / 857 75-53 58
Fax: +49 681 / 857 75-53 38
email: declerck@dfki.de

Deutsches Forschungszentrum für Künstliche Intelligenz GmbH
Trippstadter Strasse 122, D-67663 Kaiserslautern, Germany

Prof. Dr. Jana Koehler (Vorsitzende)
Dr. Walter Olthoff

Vorsitzender des Aufsichtsrats:
Prof. Dr. h.c. Hans A. Aukes
Amtsgericht Kaiserslautern, HRB 2313

Received on Sunday, 25 August 2019 18:10:58 UTC