- From: Julia Bosque Gil <jbosque@fi.upm.es>
- Date: Wed, 20 Mar 2019 13:45:35 +0100
- To: peikert.katrin@web.de
- Cc: public-ontolex <public-ontolex@w3.org>
- Message-ID: <CA+B92MuHFQ8BbqyYh+vWByUTV5z=k_SHuEcuv0rcJiUah==p8g@mail.gmail.com>
Dear Katrin, I will try to provide some possible solutions for your second issue, concerning the single part of speech -tag and the lexicog:Entry approach. My answer in-lines ;) The second issue concerns senses and part of speech tags. In EPSD2 it is possible for an entry to have a “general” part of speech tag, but some senses of it have a different tag e.g. “gal”(big), which is characterized as a “V\i” , but it can also mean “goblet”, which is tagged as “N”. But since Ontolex does not allow an LexicalEntry to have more than one part of speech tag, it is unclear to me how one could model this phenomena. The lexicog solution would be to use a lexicog:Entry for “gal” in general, and three LexicalEntry-s for the three parts of speech. Exactly, you would have *gal-v*, with senses [1-5], *gal-n* with sense [6], and *gal-adj* with sense [8]. The problem is that EPSD2 stores information about the forms and their frequency for “gal”, but not for gal with senses [1-5], [6] or [7] separately. It is unclear which form of a word is connected to which sense and how often this specific sense with a specific form is used. From what I understood, since this information is not explicitly provided in the dictionary, there is no way of automatically distinguishing this case from those in which *all forms* go with *all senses* unless you take into account the case difference in the strings. I see three possible ways of representing this, one of them easier in terms of querying, but overkill and leading to a high number of triples. The other two are more concise but would create some lexical entries without a form, and you would need to query the dictionary entry to get them. a) [*lots* of triples] Since these entries look in appearance like those in which all forms go with all senses, each created LexicalEntry receives all the forms, which would need to be triplicated. The disambiguation step in the future would involve an update to remove those Forms that are not realisations of the lexical entry at hand. b) [more concise] Only one LexicalEntry receives *all forms* (e.g. let us say, randomly, the one with the first sense, so gal-v), which might be not correct, but in this way there are no ontolex:Forms without a LexicalEntry. The other two LexicalEntries would not have a lexical form, but the lexicog:Entry would consist of LexicographicComponents that point to them via *describes*. lexicog:LexicographicComponents can also describe ontolex:Forms, since the range of the describes property is owl:Thing. If you state that the lexicog:Entry that includes components describing the three ontolex:LexicalEntries also has more components, each describing a Form, you can later on get a list of all the forms described in that dictionary entry. In this way, if you want to access the potential forms that would go with *gal-adj* or *gal-n*, you would need to perform a query in SPARQL “Given than *gal-n* is described by a LexicographicComponent which is rdfs:member of a lexicog:Entry, give me all the ontolex:Forms that are described by LexicographicComponents which are also rdfs:member of that same lexicog:Entry”. Alternatively, “Given than *gal-n* is described by a LexicographicComponent which is rdfs:member of a lexicog:Entry, give me all the ontolex:Forms of other LexicalEntries that are described by LexicographicCompoents which are also rdfs:member of that same lexicog:Entry”, and then you would get the forms linked to *gal-v*. For the last query you actually would not need to create LexicographicComponents describing Forms, because you access them via *gal-v* (unless you consider that the EPSD has indeed a section in that entry devoted to form description and you want to capture that). c) Just like (b), but the LexicographicComponents of the lexicog:Entry would not describe ontolex:LexicalEntries, but ontolex:LexicalSenses. This depends on how exactly you want to recreate the original structure that you have in the EPSD2. I hope this helps. I might be missing some other options of a solution involving *lexicog*, so, if you have any more ideas/suggestions, they are more than welcome! Best, Julia El mié., 20 mar. 2019 a las 11:39, <peikert.katrin@web.de> escribió: > Hello everyone, > > I am currently trying to create a Ontolex-model of the Electronic Penn > Sumerian Dictionary > (EPSD2, http://oracc.museum.upenn.edu/epsd2/sux > <https://deref-web-02.de/mail/client/21_NCYmjA5w/dereferrer/?redirectUrl=http%3A%2F%2Foracc.museum.upenn.edu%2Fepsd2%2Fsux>). > But several issues have arisen, which > are not easily solvable within the current Ontolex version. > > The first issue concerns the presentation of verbal prefixes in Sumerian. > While there are ways > to describe different forms of the same word, there does not seem to be a > way to do so by > describing the underlying morphological process. As an example, consider > the lexical entry > (dictionary entry) for gal: > http://oracc.museum.upenn.edu/epsd2/cbd/sux/sux.x0405180.html > <https://deref-web-02.de/mail/client/m2K5fBXYL8E/dereferrer/?redirectUrl=http%3A%2F%2Foracc.museum.upenn.edu%2Fepsd2%2Fcbd%2Fsux%2Fsux.x0405180.html> > . > Under "verbal prefixes", it lists for example ba.i.n (i.e., ba.i.n.V, > which stands for the morphological > gloss ba-i-n-gal, with three inflectional prefixes and the verbal root). > Beyond the morphological > segmentation, the analysis is not spelled out, but points to the original > attestation(s). In OntoLex, > it is however, already unclear how to represent the morphological > segmentation in the first place. > > The second issue concerns senses and part of speech tags. In EPSD2 it is > possible for an entry to > have a "general" part of speech tag, but some senses of it have a > different tag e.g. "gal"(big), which > is characterized as a "V\i" , but it can also mean "goblet", which is > tagged as "N". But since > Ontolex does not allow an LexicalEntry to have more than one part of > speech tag, it is unclear to me > how one could model this phenomena. The lexicog solution would be to use a > lexicog:Entry for "gal" in > general, and three LexicalEntry-s for the three parts of speech. The > problem is that EPSD2 stores > information about the forms and their frequency for "gal", but not for gal > with senses [1-5], [6] or [7] > separately. It is unclear which form of a word is connected to which sense > and how often this specific > sense with a specific form is used. Thus, if you try to have several > LexicalEntries of the same word, > there is no way to preserve information about forms and their frequencies, > as we cannot automatically > disambiguate the forms. (Manually an expert can to a certain extent, the > upper case strings in the forms > are determinative, which specify certain semantic types, e.g., the > material an object consists of, > indicating a nominal or adjectival sense). > > It would be really great if there could be found a way to solve this > issues. > > > Best regards, > Katrin Peikert > > > *Goethe Universität * > *Frankfurt am Main* > -- Julia Bosque Gil PhD Student Ontology Engineering Group <http://www.oeg-upm.net/> Departamento de Inteligencia Artificial Universidad Politécnica de Madrid
Received on Wednesday, 20 March 2019 12:44:05 UTC