- From: John P. McCrae <john.mccrae@insight-centre.org>
- Date: Wed, 20 Mar 2019 13:28:17 +0000
- To: Julia Bosque Gil <jbosque@fi.upm.es>
- Cc: peikert.katrin@web.de, public-ontolex <public-ontolex@w3.org>
- Message-ID: <CAHLDFnrhqWSva0iDUQAWDJ-bdPm+E45WNzKU6L5iB=MT_=y=jQ@mail.gmail.com>
Hi Katrin, Thanks for your email. I would note that there is currently a module on Morphology under development: https://www.w3.org/community/ontolex/wiki/Morphology If you would be able to contribute some of these issues to this, I think it would be very helpful for the development of this module. Regards, John On Wed, 20 Mar 2019 at 12:45, Julia Bosque Gil <jbosque@fi.upm.es> wrote: > Dear Katrin, > > I will try to provide some possible solutions for your second issue, > concerning the single part of speech -tag and the lexicog:Entry approach. > My answer in-lines ;) > > The second issue concerns senses and part of speech tags. In EPSD2 it is > possible for an entry to have a “general” part of speech tag, but some > senses of it have a different tag e.g. “gal”(big), which is characterized > as a “V\i” , but it can also mean “goblet”, which is tagged as “N”. But > since Ontolex does not allow an LexicalEntry to have more than one part of > speech tag, it is unclear to me how one could model this phenomena. The > lexicog solution would be to use a lexicog:Entry for “gal” in general, and > three LexicalEntry-s for the three parts of speech. > > Exactly, you would have *gal-v*, with senses [1-5], *gal-n* with sense > [6], and *gal-adj* with sense [8]. > > The problem is that EPSD2 stores information about the forms and their > frequency for “gal”, but not for gal with senses [1-5], [6] or [7] > separately. It is unclear which form of a word is connected to which sense > and how often this specific sense with a specific form is used. > > From what I understood, since this information is not explicitly provided > in the dictionary, there is no way of automatically distinguishing this > case from those in which *all forms* go with *all senses* unless you take > into account the case difference in the strings. I see three possible ways > of representing this, one of them easier in terms of querying, but overkill > and leading to a high number of triples. The other two are more concise but > would create some lexical entries without a form, and you would need to > query the dictionary entry to get them. > > a) [*lots* of triples] Since these entries look in appearance like those > in which all forms go with all senses, each created LexicalEntry receives > all the forms, which would need to be triplicated. The disambiguation step > in the future would involve an update to remove those Forms that are not > realisations of the lexical entry at hand. > > b) [more concise] Only one LexicalEntry receives *all forms* (e.g. let us > say, randomly, the one with the first sense, so gal-v), which might be not > correct, but in this way there are no ontolex:Forms without a LexicalEntry. > The other two LexicalEntries would not have a lexical form, but the > lexicog:Entry would consist of LexicographicComponents that point to them > via *describes*. lexicog:LexicographicComponents can also describe > ontolex:Forms, since the range of the describes property is owl:Thing. If > you state that the lexicog:Entry that includes components describing the > three ontolex:LexicalEntries also has more components, each describing a > Form, you can later on get a list of all the forms described in that > dictionary entry. In this way, if you want to access the potential forms > that would go with *gal-adj* or *gal-n*, you would need to perform a > query in SPARQL “Given than *gal-n* is described by a > LexicographicComponent which is rdfs:member of a lexicog:Entry, give me all > the ontolex:Forms that are described by LexicographicComponents which are > also rdfs:member of that same lexicog:Entry”. Alternatively, “Given than > *gal-n* is described by a LexicographicComponent which is rdfs:member of > a lexicog:Entry, give me all the ontolex:Forms of other LexicalEntries that > are described by LexicographicCompoents which are also rdfs:member of that > same lexicog:Entry”, and then you would get the forms linked to *gal-v*. > For the last query you actually would not need to create > LexicographicComponents describing Forms, because you access them via > *gal-v* (unless you consider that the EPSD has indeed a section in that > entry devoted to form description and you want to capture that). > > c) Just like (b), but the LexicographicComponents of the lexicog:Entry > would not describe ontolex:LexicalEntries, but ontolex:LexicalSenses. This > depends on how exactly you want to recreate the original structure that you > have in the EPSD2. > > I hope this helps. I might be missing some other options of a solution > involving *lexicog*, so, if you have any more ideas/suggestions, they are > more than welcome! > > Best, > > Julia > > El mié., 20 mar. 2019 a las 11:39, <peikert.katrin@web.de> escribió: > >> Hello everyone, >> >> I am currently trying to create a Ontolex-model of the Electronic Penn >> Sumerian Dictionary >> (EPSD2, http://oracc.museum.upenn.edu/epsd2/sux >> <https://deref-web-02.de/mail/client/21_NCYmjA5w/dereferrer/?redirectUrl=http%3A%2F%2Foracc.museum.upenn.edu%2Fepsd2%2Fsux>). >> But several issues have arisen, which >> are not easily solvable within the current Ontolex version. >> >> The first issue concerns the presentation of verbal prefixes in Sumerian. >> While there are ways >> to describe different forms of the same word, there does not seem to be a >> way to do so by >> describing the underlying morphological process. As an example, consider >> the lexical entry >> (dictionary entry) for gal: >> http://oracc.museum.upenn.edu/epsd2/cbd/sux/sux.x0405180.html >> <https://deref-web-02.de/mail/client/m2K5fBXYL8E/dereferrer/?redirectUrl=http%3A%2F%2Foracc.museum.upenn.edu%2Fepsd2%2Fcbd%2Fsux%2Fsux.x0405180.html> >> . >> Under "verbal prefixes", it lists for example ba.i.n (i.e., ba.i.n.V, >> which stands for the morphological >> gloss ba-i-n-gal, with three inflectional prefixes and the verbal root). >> Beyond the morphological >> segmentation, the analysis is not spelled out, but points to the original >> attestation(s). In OntoLex, >> it is however, already unclear how to represent the morphological >> segmentation in the first place. >> >> The second issue concerns senses and part of speech tags. In EPSD2 it is >> possible for an entry to >> have a "general" part of speech tag, but some senses of it have a >> different tag e.g. "gal"(big), which >> is characterized as a "V\i" , but it can also mean "goblet", which is >> tagged as "N". But since >> Ontolex does not allow an LexicalEntry to have more than one part of >> speech tag, it is unclear to me >> how one could model this phenomena. The lexicog solution would be to use >> a lexicog:Entry for "gal" in >> general, and three LexicalEntry-s for the three parts of speech. The >> problem is that EPSD2 stores >> information about the forms and their frequency for "gal", but not for >> gal with senses [1-5], [6] or [7] >> separately. It is unclear which form of a word is connected to which >> sense and how often this specific >> sense with a specific form is used. Thus, if you try to have several >> LexicalEntries of the same word, >> there is no way to preserve information about forms and their >> frequencies, as we cannot automatically >> disambiguate the forms. (Manually an expert can to a certain extent, the >> upper case strings in the forms >> are determinative, which specify certain semantic types, e.g., the >> material an object consists of, >> indicating a nominal or adjectival sense). >> >> It would be really great if there could be found a way to solve this >> issues. >> >> >> Best regards, >> Katrin Peikert >> >> >> *Goethe Universität * >> *Frankfurt am Main* >> > > > -- > > Julia Bosque Gil > PhD Student > Ontology Engineering Group <http://www.oeg-upm.net/> > Departamento de Inteligencia Artificial > Universidad Politécnica de Madrid >
Received on Wednesday, 20 March 2019 13:28:54 UTC