- From: Thierry Declerck <declerck@dfki.de>
- Date: Sun, 25 Aug 2019 20:11:38 +0200
- To: public-ontolex@w3.org
Dear FRAC workers :-)
In the recent past I re-read the wiki
(https://acoli-repo.github.io/ontolex-frac/), with a focus on Embeddings.
The text in general is already very good, and I liked the direction
taken for representing word embeddings attached to a "lexical element"
(form, concept, sense, entry,....)
Myself I was also interested on how to represent similarities resulting
from a (pre-trained) word embedding data set. So for example how does
the form "card" relates to other forms, in term of similarities induced
by word2vec (or other approaches).
I came up now with 2 distinct suggestions: one using blank nodes, and
one using instances of a class (in case we want to "reify" such semantic
similarities). The small experiment is done within the core module of
OntoLex-Lemon and could be easily be transferred to the frac module, one
it is operational. But first we need to get to a good and consensual
modeling!
Below you can find the very preliminary code (please check the two
suggested was to encode semantic similarity between "card" and "cards",
with the property hasEmbeddingSimilarityWith (in one case pointing to an
instance, in the other case pointing to a blank node). We would need
further to indicate the decreasing order of similarities with other forms.
Well just some food for the next telco on frac.
Thanks!
THierry
# baseURI: http://tutorial-topbraid.com/morphsem
# imports: http://purl.org/dc/elements/1.1/
# imports: http://purl.org/dc/terms/
# imports: http://www.lexinfo.net/ontology/2.0/lexinfo
# imports: http://www.w3.org/2004/02/skos/core
# imports: http://www.w3.org/ns/lemon/decomp
# imports: http://www.w3.org/ns/lemon/ontolex
@prefix : <http://tutorial-topbraid.com/morphsem#> .
@prefix decomp: <http://www.w3.org/ns/lemon/decomp/#> .
@prefix lemon: <http://lemon-model.net/lemon#> .
@prefix lexinfo: <http://www.lexinfo.net/ontology/2.0/lexinfo#> .
@prefix ontolex: <http://www.w3.org/ns/lemon/ontolex#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
lemon:hasSimilarity_value
rdf:type owl:DatatypeProperty ;
rdfs:comment "gives a real number as value" ;
rdfs:domain :Embedding_Similarities ;
rdfs:label "Similarity value(@en}" ;
rdfs:range xsd:decimal ;
.
<http://tutorial-topbraid.com/morphsem>
rdf:type owl:Ontology ;
owl:imports <http://purl.org/dc/elements/1.1/> ;
owl:imports <http://purl.org/dc/terms/> ;
owl:imports <http://www.lexinfo.net/ontology/2.0/lexinfo> ;
owl:imports <http://www.w3.org/2004/02/skos/core> ;
owl:imports <http://www.w3.org/ns/lemon/decomp> ;
owl:imports <http://www.w3.org/ns/lemon/ontolex> ;
owl:versionInfo "Created with TopBraid Composer" ;
.
:Embedding_Algorithm
rdf:type owl:Class ;
rdfs:comment "Listing the distinct approaches" ;
rdfs:label "Embedding algorithm" ;
.
:Embedding_Algorithm_W2V
rdf:type :Embedding_Algorithm ;
rdfs:comment "the Word2Vec algorithm" ;
rdfs:label "Word2Vec" ;
.
:Embedding_Similarities
rdf:type owl:Class ;
rdfs:comment "Represent similarities are given by embedding algorithms" ;
rdfs:label "Embedding similarities(@en}" ;
.
:Embedding_W2C_1
rdf:type :Embedding_Similarities ;
lemon:hasSimilarity_value 0.89 ;
:ObjectProperty_hasEmbeddingAlgorithm :Embedding_Algorithm_W2V ;
rdfs:comment "Represent similarities are given by word2vec" ;
rdfs:label "Word2Vec Embedding similarities(@en}" ;
.
:Form_card
rdf:type ontolex:Form ;
:ObjectProperty_hasEmbeddingSimilarityWith :Embedding_W2C_1 ;
:ObjectProperty_hasEmbeddingSimilarityWith [
:hasEmbeddingAlgorithm :Embedding_Algorithm_W2V ;
rdf:value "0.89" ;
ontolex:canonicalForm :Form_cards ;
ontolex:writtenRep "cards"@en ;
] ;
lexinfo:number lexinfo:singular ;
ontolex:writtenRep "card"@en ;
.
:Form_cards
rdf:type ontolex:Form ;
lexinfo:number lexinfo:plural ;
ontolex:writtenRep "cards"@en ;
.
:LexicalSense_card
rdf:type ontolex:LexicalSense ;
rdfs:comment "portable physical object used for identification,
authentication, data storage, or financial transaction (taken from
wikidata) {@en} " ;
ontolex:isSenseOf :card ;
ontolex:reference <https://www.wikidata.org/wiki/Q42965339> ;
.
:ObjectProperty_hasEmbeddingAlgorithm
rdf:type owl:ObjectProperty ;
rdfs:domain :Embedding_Similarities ;
rdfs:range :Embedding_Algorithm ;
.
:ObjectProperty_hasEmbeddingSimilarityWith
rdf:type owl:ObjectProperty ;
rdfs:domain ontolex:Form ;
rdfs:range :Embedding_Similarities ;
rdfs:range ontolex:Form ;
.
:card
rdf:type ontolex:Word ;
lexinfo:partOfSpeech lexinfo:noun ;
ontolex:canonicalForm :Form_card ;
ontolex:otherForm :Form_cards ;
ontolex:reference <https://www.wikidata.org/wiki/Q1420> ;
ontolex:sense :LexicalSense_card ;
.
--
Thierry Declerck
Senior Consultant at DFKI GmbH, Multilinguality and Language Technology
Stuhlsatzenhausweg, 3
D-66123 Saarbruecken
Phone: +49 681 / 857 75-53 58
Fax: +49 681 / 857 75-53 38
email: declerck@dfki.de
-------------------------------------------------------------
Deutsches Forschungszentrum für Künstliche Intelligenz GmbH
Trippstadter Strasse 122, D-67663 Kaiserslautern, Germany
Geschäftsführung:
Prof. Dr. Jana Koehler (Vorsitzende)
Dr. Walter Olthoff
Vorsitzender des Aufsichtsrats:
Prof. Dr. h.c. Hans A. Aukes
Amtsgericht Kaiserslautern, HRB 2313
-------------------------------------------------------------
Received on Sunday, 25 August 2019 18:10:58 UTC