- From: Thierry Declerck <declerck@dfki.de>
- Date: Sun, 25 Aug 2019 20:11:38 +0200
- To: public-ontolex@w3.org
Dear FRAC workers :-) In the recent past I re-read the wiki (https://acoli-repo.github.io/ontolex-frac/), with a focus on Embeddings. The text in general is already very good, and I liked the direction taken for representing word embeddings attached to a "lexical element" (form, concept, sense, entry,....) Myself I was also interested on how to represent similarities resulting from a (pre-trained) word embedding data set. So for example how does the form "card" relates to other forms, in term of similarities induced by word2vec (or other approaches). I came up now with 2 distinct suggestions: one using blank nodes, and one using instances of a class (in case we want to "reify" such semantic similarities). The small experiment is done within the core module of OntoLex-Lemon and could be easily be transferred to the frac module, one it is operational. But first we need to get to a good and consensual modeling! Below you can find the very preliminary code (please check the two suggested was to encode semantic similarity between "card" and "cards", with the property hasEmbeddingSimilarityWith (in one case pointing to an instance, in the other case pointing to a blank node). We would need further to indicate the decreasing order of similarities with other forms. Well just some food for the next telco on frac. Thanks! THierry # baseURI: http://tutorial-topbraid.com/morphsem # imports: http://purl.org/dc/elements/1.1/ # imports: http://purl.org/dc/terms/ # imports: http://www.lexinfo.net/ontology/2.0/lexinfo # imports: http://www.w3.org/2004/02/skos/core # imports: http://www.w3.org/ns/lemon/decomp # imports: http://www.w3.org/ns/lemon/ontolex @prefix : <http://tutorial-topbraid.com/morphsem#> . @prefix decomp: <http://www.w3.org/ns/lemon/decomp/#> . @prefix lemon: <http://lemon-model.net/lemon#> . @prefix lexinfo: <http://www.lexinfo.net/ontology/2.0/lexinfo#> . @prefix ontolex: <http://www.w3.org/ns/lemon/ontolex#> . @prefix owl: <http://www.w3.org/2002/07/owl#> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix xsd: <http://www.w3.org/2001/XMLSchema#> . lemon:hasSimilarity_value rdf:type owl:DatatypeProperty ; rdfs:comment "gives a real number as value" ; rdfs:domain :Embedding_Similarities ; rdfs:label "Similarity value(@en}" ; rdfs:range xsd:decimal ; . <http://tutorial-topbraid.com/morphsem> rdf:type owl:Ontology ; owl:imports <http://purl.org/dc/elements/1.1/> ; owl:imports <http://purl.org/dc/terms/> ; owl:imports <http://www.lexinfo.net/ontology/2.0/lexinfo> ; owl:imports <http://www.w3.org/2004/02/skos/core> ; owl:imports <http://www.w3.org/ns/lemon/decomp> ; owl:imports <http://www.w3.org/ns/lemon/ontolex> ; owl:versionInfo "Created with TopBraid Composer" ; . :Embedding_Algorithm rdf:type owl:Class ; rdfs:comment "Listing the distinct approaches" ; rdfs:label "Embedding algorithm" ; . :Embedding_Algorithm_W2V rdf:type :Embedding_Algorithm ; rdfs:comment "the Word2Vec algorithm" ; rdfs:label "Word2Vec" ; . :Embedding_Similarities rdf:type owl:Class ; rdfs:comment "Represent similarities are given by embedding algorithms" ; rdfs:label "Embedding similarities(@en}" ; . :Embedding_W2C_1 rdf:type :Embedding_Similarities ; lemon:hasSimilarity_value 0.89 ; :ObjectProperty_hasEmbeddingAlgorithm :Embedding_Algorithm_W2V ; rdfs:comment "Represent similarities are given by word2vec" ; rdfs:label "Word2Vec Embedding similarities(@en}" ; . :Form_card rdf:type ontolex:Form ; :ObjectProperty_hasEmbeddingSimilarityWith :Embedding_W2C_1 ; :ObjectProperty_hasEmbeddingSimilarityWith [ :hasEmbeddingAlgorithm :Embedding_Algorithm_W2V ; rdf:value "0.89" ; ontolex:canonicalForm :Form_cards ; ontolex:writtenRep "cards"@en ; ] ; lexinfo:number lexinfo:singular ; ontolex:writtenRep "card"@en ; . :Form_cards rdf:type ontolex:Form ; lexinfo:number lexinfo:plural ; ontolex:writtenRep "cards"@en ; . :LexicalSense_card rdf:type ontolex:LexicalSense ; rdfs:comment "portable physical object used for identification, authentication, data storage, or financial transaction (taken from wikidata) {@en} " ; ontolex:isSenseOf :card ; ontolex:reference <https://www.wikidata.org/wiki/Q42965339> ; . :ObjectProperty_hasEmbeddingAlgorithm rdf:type owl:ObjectProperty ; rdfs:domain :Embedding_Similarities ; rdfs:range :Embedding_Algorithm ; . :ObjectProperty_hasEmbeddingSimilarityWith rdf:type owl:ObjectProperty ; rdfs:domain ontolex:Form ; rdfs:range :Embedding_Similarities ; rdfs:range ontolex:Form ; . :card rdf:type ontolex:Word ; lexinfo:partOfSpeech lexinfo:noun ; ontolex:canonicalForm :Form_card ; ontolex:otherForm :Form_cards ; ontolex:reference <https://www.wikidata.org/wiki/Q1420> ; ontolex:sense :LexicalSense_card ; . -- Thierry Declerck Senior Consultant at DFKI GmbH, Multilinguality and Language Technology Stuhlsatzenhausweg, 3 D-66123 Saarbruecken Phone: +49 681 / 857 75-53 58 Fax: +49 681 / 857 75-53 38 email: declerck@dfki.de ------------------------------------------------------------- Deutsches Forschungszentrum für Künstliche Intelligenz GmbH Trippstadter Strasse 122, D-67663 Kaiserslautern, Germany Geschäftsführung: Prof. Dr. Jana Koehler (Vorsitzende) Dr. Walter Olthoff Vorsitzender des Aufsichtsrats: Prof. Dr. h.c. Hans A. Aukes Amtsgericht Kaiserslautern, HRB 2313 -------------------------------------------------------------
Received on Sunday, 25 August 2019 18:10:58 UTC