Re: Issues concerning Morphology and Part of Speech Tags

Dear Christian,

Thank you for your comments, this is leading to a nice discussion indeed
:)  My comments in blue here below:

El mié., 20 mar. 2019 a las 15:52, Christian Chiarcos (<
chiarcos@informatik.uni-frankfurt.de>) escribió:

> Am .03.2019, 13:45 Uhr, schrieb Julia Bosque Gil <jbosque@fi.upm.es>:
>
> Dear Katrin,
>
> I will try to provide some possible solutions for your second issue,
> concerning the single part of speech -tag and the lexicog:Entry approach.
> My answer in-lines ;)
>
> The second issue concerns senses and part of speech tags. In EPSD2 it is
> possible for an entry to have a “general” part of speech tag, but some
> senses of it have a different tag e.g. “gal”(big), which is characterized
> as a “V\i” , but it can also mean “goblet”, which is tagged as “N”. But
> since Ontolex does not allow an LexicalEntry to have more than one part of
> speech tag, it is unclear to me how one could model this phenomena. The
> lexicog solution would be to use a lexicog:Entry for “gal” in general, and
> three LexicalEntry-s for the three parts of speech.
>
> Exactly, you would have *gal-v*, with senses [1-5], *gal-n* with sense
> [6], and *gal-adj* with sense [8].
>
> The problem is that EPSD2 stores information about the forms and their
> frequency for “gal”, but not for gal with senses [1-5], [6] or [7]
> separately. It is unclear which form of a word is connected to which sense
> and how often this specific sense with a specific form is used.
>
> From what I understood, since this information is not explicitly provided
> in the dictionary, there is no way of automatically distinguishing this
> case from those in which *all forms* go with *all senses* unless you take
> into account the case difference in the strings. I see three possible ways
> of representing this, one of them easier in terms of querying, but overkill
> and leading to a high number of triples. The other two are more concise but
> would create some lexical entries without a form, and you would need to
> query the dictionary entry to get them.
>
> a) [*lots* of triples] Since these entries look in appearance like those
> in which all forms go with all senses, each created LexicalEntry receives
> all the forms, which would need to be triplicated. The disambiguation step
> in the future would involve an update to remove those Forms that are not
> realisations of the lexical entry at hand.
>
> The first problem is that there is frequency information to be added about
> the forms, and these frequencies refer to non-disambiguated forms. If we
> represent the ambiguity by double linking, this is semantically less
> incorrect. The second problem is that not all forms seem to go with all
> senses, but we cannot tell them apart, so representing this without a hint
> that it is ambiguous is just wrong.
>

Ooh, I see your point. Yes, option (a) would not represent the fact that it
is ambiguous.


> a'): Can't we just point from several lexical entries to the same form?
> The definition is " A *form* represents one grammatical realization of a
> lexical entry." and this is ambiguous regarding the interpretation of "a"
> as either "one" or an existential quantifier. The latter would permit
> multiple lexical entries per form.
> Note that the situation is *not* analogous with LexicalSense, where we
> cannot interpret "a" as existential quantifier, because it is further
> elaborated as "a pair of a *uniquely determined* lexical entry and a
> uniquely determined ontology entity", but there doesn't seem to be a
> comparable restriction to ontolex:Form.
>

This would represent the ambiguity in an elegant way, I think, but then, in
the existential quantifier reading, the definition of ontolex:Form would be
interpreted as " A *form* represents one grammatical realization of *at
least* one lexical entry". I am only skeptical about whether a grammatical
realization can be a realization of two different entries at the same time
(vs. two different realizations that have the same properties) and whether
interpreting so would be accurate...


>
> b) [more concise] Only one LexicalEntry receives *all forms* (e.g. let us
> say, randomly, the one with the first sense, so gal-v), which might be not
> correct, but in this way there are no ontolex:Forms without a LexicalEntry.
> The other two LexicalEntries would not have a lexical form, but the
> lexicog:Entry would consist of LexicographicComponents that point to them
> via *describes*. lexicog:LexicographicComponents can also describe
> ontolex:Forms, since the range of the describes property is owl:Thing. If
> you state that the lexicog:Entry that includes components describing the
> three ontolex:LexicalEntries also has more components, each describing a
> Form, you can later on get a list of all the forms described in that
> dictionary entry. In this way, if you want to access the potential forms
> that would go with *gal-adj* or *gal-n*, you would need to perform a
> query in SPARQL “Given than *gal-n* is described by a
> LexicographicComponent which is rdfs:member of a lexicog:Entry, give me all
> the ontolex:Forms that are described by LexicographicComponents which are
> also rdfs:member of that same lexicog:Entry”. Alternatively, “Given than
> *gal-n* is described by a LexicographicComponent which is rdfs:member of
> a lexicog:Entry, give me all the ontolex:Forms of other LexicalEntries that
> are described by LexicographicCompoents which are also rdfs:member of that
> same lexicog:Entry”, and then you would get the forms linked to *gal-v*.
> For the last query you actually would not need to create
> LexicographicComponents describing Forms, because you access them via
> *gal-v* (unless you consider that the EPSD has indeed a section in that
> entry devoted to form description and you want to capture that).
>
> Pretty complicated, and the scope of form attestations and their
> frequencies would be equally incorrect as with duplicating all forms.
>

> c) Just like (b), but the LexicographicComponents of the lexicog:Entry
> would not describe ontolex:LexicalEntries, but ontolex:LexicalSenses. This
> depends on how exactly you want to recreate the original structure that you
> have in the EPSD2.
>
> d): using a  yet-to-be-determined property from the morphology module that
> associates a LexicalEntry with another, and unless explicit forms are
> specified, inherits all its lexicalForm properties. From the current
> discussion, that could be a subproperty of the non-reified version of
> morph:DerivationalRelation (cf.
> https://www.w3.org/community/ontolex/wiki/Morphology, working examples)
> as suggested (by me ;) for "zero derivation" (morphology wiki, discussion
> under N11).
>

I agree that my option (b) complicates the representation and assumes some
forms to be associated with a particular entry (which might be incorrect,
but we cannot know which attested forms go with which entry, if I
understood the problem). So, without knowing to which entry each form
belongs, I only see three options (now revisited after your e-mail):

*(a)* All lexical entries receive the forms, triplicating the list of forms
("gal" in Sumerian has 198 attested forms without information about the
sense, so this option should probably be reconsidered...!)
*(b)* The entries share the forms by interpreting "a" in the definition of
ontolex:Form as existential quantifier. What worries me here is that I am
not sure about a "realisation" being a realisation of more than one entry
at the same time.
*(c)* Only one lexical entry is linked to the forms. For the other lexical
entries...either they inherit from the first entry with your new property,
or we would need to access the forms through *lexicog* mechanisms.
Regarding this new property that you suggest, it makes a lot of sense to me
if you know beforehand that there is an "original" LexicalEntry (or one you
want to treat as "original") which does occur in a series of forms, and the
other lexical entries are realized with the same grammatical properties. It
would be a nice solution to the problem of the adjective-adverb issue in
German we discussed in some calls on the *lexicog* module, as you
mentioned. But, for the example of the Sumerian data, I might have missed
something or got lost in the process: how do we know which forms are linked
to the "original" LexicalEntry on the first place, if there is no way to
know from the data which of the 198 forms of "gal" are connected to which
lexical entry (v, n, or adj)? In other words,  are we preventing a wrong
scope of form attestations in any of the ways of implementing option (c)?

If we knew which forms go with which lexical entries, the senses of each
entry do not necessarily need to occur with all the forms of that lexical
entry (see https://jogracia.github.io/ontolex-lexicog/#formrestriction).
The definition of Lexical Entry currently states so (A* lexical entry
represents a unit of analysis of the lexicon that consists of a set of
forms that are grammatically related and a set of base meanings that are
associated with all *[bold typeface added] *of these forms*), but, if I am
not mistaken, we agreed in a telco to loosen that restriction.

Apologies if I have missed or wrongly understood something :) Thank you!

Best,

Julia


>
> In Sumerian, this is probably not a derivation proper, but a
> grammaticalization or lexicalization, so morph:zeroDerivation or the like
> would be slightly misplaced, but a possible alternative name could be
> morph:reanalyzedAs (referring to the grammaticalization process),
> morph:grammaticalizedAs, or morph:cast (by analogy with type casting in
> programming languages), and the definition of this property could be "a
> derviational relation between a lexical entry and another lexical entry
> with the same canonical form, but different part of speech. For a
> reanalyzed (grammaticalized, zero-derived) lexical entry, sense and form
> information is optional, if not provided, sense and/or form information is
> extended from (or identical with that of) the original lexical entry. (Note
> that this constrain only holds under the closed world assumption).  The
> canonical form of the target, if provided, *must* be string identical
> (modulo capitalization) to the canonical form of the source."
>
> Another application of this property would the preposition-complementizer
> ambiguity in English, the adjective-adverb "derivation" in German or the
> preposition-adverb-particle ambiguity in most older West Germanic
> languages, so I think, there's enough lexicographic motivation.
>
> Best,
> Christian
>
> I hope this helps. I might be missing some other options of a solution
> involving *lexicog*, so, if you have any more ideas/suggestions, they are
> more than welcome!
>
> Best,
>
> Julia
>
> El mié., 20 mar. 2019 a las 11:39, <peikert.katrin@web.de> escribió:
>
>> Hello everyone,
>>
>> I am currently trying to create a Ontolex-model of the Electronic Penn
>> Sumerian Dictionary
>> (EPSD2, http://oracc.museum.upenn.edu/epsd2/sux
>> <https://deref-web-02.de/mail/client/21_NCYmjA5w/dereferrer/?redirectUrl=http%3A%2F%2Foracc.museum.upenn.edu%2Fepsd2%2Fsux>).
>> But several issues have arisen, which
>> are not easily solvable within the current Ontolex version.
>>
>> The first issue concerns the presentation of verbal prefixes in Sumerian.
>> While there are ways
>> to describe different forms of the same word, there does not seem to be a
>> way to do so by
>> describing the underlying morphological process. As an example, consider
>> the lexical entry
>> (dictionary entry) for gal:
>> http://oracc.museum.upenn.edu/epsd2/cbd/sux/sux.x0405180.html
>> <https://deref-web-02.de/mail/client/m2K5fBXYL8E/dereferrer/?redirectUrl=http%3A%2F%2Foracc.museum.upenn.edu%2Fepsd2%2Fcbd%2Fsux%2Fsux.x0405180.html>
>> .
>> Under "verbal prefixes", it lists for example ba.i.n (i.e., ba.i.n.V,
>> which stands for the morphological
>> gloss ba-i-n-gal, with three inflectional prefixes and the verbal root).
>> Beyond the morphological
>> segmentation, the analysis is not spelled out, but points to the original
>> attestation(s). In OntoLex,
>> it is however, already unclear how to represent the morphological
>> segmentation in the first place.
>>
>> The second issue concerns senses and part of speech tags. In EPSD2 it is
>> possible for an entry to
>> have a "general" part of speech tag, but some senses of it have a
>> different tag e.g. "gal"(big), which
>> is characterized as a "V\i" , but it can also mean "goblet", which is
>> tagged as "N". But since
>> Ontolex does not allow an LexicalEntry to have more than one part of
>> speech tag, it is unclear to me
>> how one could model this phenomena. The lexicog solution would be to use
>> a lexicog:Entry for "gal" in
>> general, and three LexicalEntry-s for the three parts of speech. The
>> problem is that EPSD2 stores
>> information about the forms and their frequency for "gal", but not for
>> gal with senses [1-5], [6] or [7]
>> separately. It is unclear which form of a word is connected to which
>> sense and how often this specific
>> sense with a specific form is used. Thus, if you try to have several
>> LexicalEntries of the same word,
>> there is no way to preserve information about forms and their
>> frequencies, as we cannot automatically
>> disambiguate the forms. (Manually an expert can to a certain extent, the
>> upper case strings in the forms
>> are determinative, which specify certain semantic types, e.g., the
>> material an object consists of,
>> indicating a nominal or adjectival sense).
>>
>> It would be really great if there could be found a way to solve this
>> issues.
>>
>>
>> Best regards,
>> Katrin Peikert
>>
>>
>> *Goethe Universität *
>> *Frankfurt am Main*
>>
>
>
> --
>
> Julia Bosque Gil
> PhD Student
> Ontology Engineering Group <http://www.oeg-upm.net/>
> Departamento de Inteligencia Artificial
> Universidad Politécnica de Madrid
>
>
>
>
> --
> Prof. Dr. Christian Chiarcos
> Applied Computational Linguistics
> Johann Wolfgang Goethe Universität Frankfurt a. M.
> 60054 Frankfurt am Main, Germany
>
> office: Robert-Mayer-Str. 11-15, #107
> mail: chiarcos@informatik.uni-frankfurt.de
> web: http://acoli.cs.uni-frankfurt.de
> tel: +49-(0)69-798-22463
> fax: +49-(0)69-798-28334
>


-- 

Julia Bosque Gil
PhD Student
Ontology Engineering Group <http://www.oeg-upm.net/>
Departamento de Inteligencia Artificial
Universidad Politécnica de Madrid

Received on Wednesday, 20 March 2019 18:00:06 UTC