RE: Meaning and Semiotics - Issues for Modelling from Anne Schumann on 2012-08-21 (public-ontolex@w3.org from August 2012)

From: Anne Schumann <anne.schumann@Tilde.lv>
Date: Tue, 21 Aug 2012 13:44:18 +0300
To: John McCrae <jmccrae@cit-ec.uni-bielefeld.de>
CC: "public-ontolex@w3.org" <public-ontolex@w3.org>
Message-ID: <AC6FD4BB9BB02540AC7322091A6C3B5472AFFDBF38@postal.Tilde.lv>
Hi,

Please allow a few more comments. Diminutive is not all that marginal as it may have seemed from what I wrote earlier. In Russian, diminutives can be applied to many different kinds of nouns (not just names) and adjectives. Besides Balto-Slavonic, other languages also use similar features, Italian being a rather interesting case where diminutives seem to be employed both for deriving new lexical units and for pragmatic marking. Anyway, the more general point I was trying to make is that, as it seems to me, some of these morphemes are different from standard inflectional (or derivational) morphology in that they really activate different shades of the concept invoked by the lexical unit. They do not just reflect the grammatical structure of an utterance, but contribute new semantic information about the concept in question. In other words, it seems to be largely arbitrary whether a verb takes genitive or accusative case as far as the concept is concerned, but the same is not true for the issues discussed here and not each diminutive, for example, can be applied to each noun. Therefore, theses features for me belong to the interface between linguistic and semantic knowledge.
As for “fressen”, I have doubts that this is a case of polysemy. Another contributor to this discussion mentioned prototype theory earlier and maybe this is a more expressive approach for this particular case. Similarly, I tend not to consider aspect as a purely morphological, but rather an interface phenomenon as well (and things get even more complicated when aspect interacts with Aktionsart).

Cheers,
anne

From: johnmccrae@gmail.com [mailto:johnmccrae@gmail.com] On Behalf Of John McCrae
Sent: Monday, August 20, 2012 7:17 PM
To: Anne Schumann
Cc: public-ontolex@w3.org
Subject: Re: Meaning and Semiotics - Issues for Modelling

Hi all,

I added Piek's list of features for the lexicon here http://www.w3.org/community/ontolex/wiki/Specification_of_Requirements/Lexicon-Ontology-Mapping


I think that 1-6 are fairly uncontroversially part of the lexicon, and we should have an agreement on the modelling here.

7. "probability of packaging" seems to move from Lexicography to some kind of quote meta-lexicography unquote... i.e., from "MeSH ID D030361 is lexicalised as 'HPV'" to "MeSH IDs are frequently lexicalised as abbreviations"... Not necessarily a bad thing, but it is stretching the scope of the group (but that said both LMF and lemon have systems for describing regular inflection, e.g., "English nouns pluralize by adding s" that also count as "meta-lexicography", and I would argue that falls under the scope of the group...)
8. Subjectivity and connotation are something I admit we do not really try to model in lemon<http://www.lemon-model.net> (and I believe are not handled by LMF)... I am not sure what the modelling would look like here (but it would be good to see some proposals, *hint*, *hint*....)
9. Not sure what is meant by "social roles"... are you referring to something like this http://en.wikipedia.org/wiki/Honorific_speech_in_Japanese?

10. Again not entirely sure what is intended, mostly gender and aspect for me count as Morphological issues... however perhaps you mean true semantic distinctions in syntax such as http://en.wikipedia.org/wiki/Luganda#Noun_classes or http://en.wikipedia.org/wiki/Chinese_counter? Examples from Japanese
Niwa no hato = 2 doves (birds)
Nippiki no hotaru = 2 fireflies (small animals)
Niko no tane = 2 seeds (small objects)
Nippon no hon = 2 books/scrolls (long objects)
Nimai no sara = 2 plates (flat objects)
Futari no onna = 2 women (people)
These form a difficult class, as these distinctions come from the ontology but affect the syntax, unlike gender in European languages (cf., das Kind = child (neuter)).

And in response to Anne:

For me, diminutive is a morphosyntactic property and I have included them under Piek's point 3. This is based on my guess that we likely don't want to go much deeper than simple properties for modelling here. The connotative affect of diminutive in Russian is very interesting, but it seems quite a narrow case... can anyone perhaps situate it in a more general linguistic context? (i.e., modelling that is only useful for Russian gansters' names is likely not so pressing ;) ).

Modality and aspects are standard morphosyntactic features, I agree that they should be modelled (but I would again file them under point 3).

Metaphor and Irony... I think this leads to a long discussion. My opinion is that lexical entries (words + phrases) have some core meaning(s)*, e.g., "fressen" means "to eat (like an animal)", and even if the interpretation in some example is different in an actual example this can be treated as an external phenomena to the ontology-lexicon. That means I am not concerned about "The White House  announced a new policy" even though the "White House" is a building and can certainly not announce or even speak; furthermore, I do not feel it is necessary to introduce another sense for the phrase "The White House" as "the representatives of the U.S. Government" into the lexicon, but that it is the duty of the NLP system to make this "leap" by itself. But I would love to hear other opinions...

Regards,
John

* i.e., The distinction is made between systematic and non-systematic polysemy. See Paul Buitelaar or Wim Peter's work on this....

On Mon, Aug 20, 2012 at 10:55 AM, Anne Schumann <anne.schumann@tilde.lv<mailto:anne.schumann@tilde.lv>> wrote:
Dear Ontolex group,

I have been reading this discussion with a lot of interest just after returning from my holidays (therefore the late reply). Although I am not formally a member of this group, I would like to comment on some of the aspects that have been discussed so far, hoping my ideas are not completely arbitrary. About myself: I do a Phd in terminology at the University of Vienna and therefore have spent some time thinking about most of the issues touched upon in this discussion.
Although the term “ontology” is used in terminology often just with the meaning of “concept system” (without most of the fancy reasoning abilities attributed to ontologies in more technical circles), grounding work has been carried out already in the 30-ies by an Austrian engineer called Eugen Wüster. Since terminological practice deals mainly with specialized vocabularies, mainstream approaches are rather successful in neatly separating linguistic and conceptual properties, however, in practice, problems related to term variation, term evolution, normativity/descriptivity, prototypicality etc. (some of these already mentioned in the discussion) are evident.
I do not think that this happens by chance. In fact, as far as I understand the scope of the work discussed here, the challenge lies also in providing ways to model the nexus between linguistic form and meaning (the semiotic triangle mentioned earlier) in a more expressive way than the standard approaches that may be appropriate for medical terminology or classifying types of steel, but not other areas of language (and for a wider range of languages). Therefore, I really liked the list of linguistic features provided by Piek Vossen. To his list, I would like to add some features that, in my view, lie at the heart of the problem since they are mainly pragmatic (?) in nature and therefore not merely “linguistic” (and thus, maybe, should not be tucked away in the lexicon), but really connect linguistic form and conceptual meaning in a regular fashion:


-          Diminutives. Some languages employ diminutives as complex pragmatic markers (that is, not just as markers of sth. being small), e. g. Russian. Some diminutives of Russian names have complex sociolinguistic functions (expressing social hierarchies (Anja – Anjechka), affection (Anja – Anjuta/Anjutka, Sergej – Serjozha) or even the fact of belonging to a criminal organization or being at least gangster-like (Sergej – Serjoga)), others, however, seem to be devoid of any specific meaning.

-          Modality and reported speech (subjonctif, congiuntivo). In some languages, the expression of modality is grammaticalised, e.g. compare Latvian “Tu eji” (you go) to “Tev ir jāiet” (you have to go). In certain Russian constructions, on the other hand, the type of modality (you should go, you have to go, you can go) is completely opaque. On the other hand, romance languages have special paradigms for the expression of reported speech (lui avrebbe detto). Since these features, however, are productive, it may be reasonable to model them in some way.

-          Metaphoric usage and irony. The German dichotomy of “essen” vs. “fressen” leaves ample space for examples (jmd. zum Fressen gern haben; du sollst nicht fressen, sondern essen; den hab ich wirklich gefressen! ...). I have, at least, some doubts that modeling this on the linguistic level (e.g. as different word senses) is the optimal solution (according to my intuition, at least, they are not different senses).

It has been pointed out that the decisions regarding these issues may be taken in a pragmatic way. I hope, however, that my comments were useful.

Best regards,
Anne-Kathrin Schumann


From: Piek Vossen [mailto:piek.vossen@vu.nl<mailto:piek.vossen@vu.nl>]
Sent: Sunday, August 12, 2012 4:08 PM
To: Aldo Gangemi
Cc: public-ontolex@w3.org<mailto:public-ontolex@w3.org>; Guido Vetere; John McCrae

Subject: Re: Meaning and Semiotics - Issues for Modelling

This is lengthy reply and thanks for the explanations about the tools and approach.

Some think any formalization of a lexical property can also be represented in the ontology and some think that is at least still debated how. I guess that nobody claimed so far that
a formalization of a linguistic property cannot be expressed in an ontology in OWL.

But let me try to make the discussion more bullet-wise by drawing up a (non-comprehensive) list of typical linguistic features that are normally not represented in an ontology:

1. dialect, age-group, formal, informal registers, etc....
-> probably not in OWL
2. relations between different meanings (polysemy relations) such a specialization, generalization, metonymy, metaphor ->
 -> probably not in OWL
3. morphological properties
 -> probably not in OWL
4. pronunciation
 -> probably not in OWL
5. syntax:
 -> probably not in OWL but:
- verb syntax needs to be mapped to argument structure and argument structure map to event structure in the ontology: buy and sell are variants with different syntactic mappings to the same event structure. Other examples are “teach” and “learn” (in some languages expressed by the same verb).
- Same for countability which is partially semantic and partially a form choice.
6. collocational contraints, e.g. blow your nose and clear your throat
 -> probably not in OWL
7. probability of packaging: complex concepts are phrased typically as e.g. adjective noun, compound, prepositional phrase, verb phrase, etc..
 -> probably not in OWL
8. subjectivity relations and connotations
-> partly this can be in the ontology reflected in the form of a social role but it is (usually) not done
9. social roles
-> this is a border case: it can and probably should be reflected in the ontology but it is extremely rich  and complex
10. some others: gender in many languages, politeness markers, aspectual properties....

So there is a lot of work to do to flesh this out and to make a proposal that reflects best practice

best

Piek



On Aug 11, 2012, at 8:14 PM, Aldo Gangemi wrote:

Hi all, I try a contribution from holidays.

I appreciate Piek's attempt to distinguish three aspects of the ontology-lexicon modeling. My answers are (1) that any semantic aspect of a lexical unit can be formalized (either in crisp or fuzzy logics), (2) that modeling any of those aspects can be useful in at least some task, and (3) that if we get more concepts from a lexicon, we should be entitled to reuse them in order to evolve ontologies.
Therefore, I am totally liberal about all flows from lexicons to ontologies. I am also ready to any procedure to derive lexicons from (impoverished) ontologies.

The reason why I am so liberal, is that I fully accept the consequences of adopting semiotics, the only theory that is expressive enough to consider linguistic and logical semantics as special cases. Let me explain this as briefly as possible.

The core problem we are discussing is the actual nature of meaning as it is represented in NL/lexicons or in ontologies. This problem goes back deeply in philosophy, linguistics and logic, as Guido said. However, the Semantic Web does not claim to go so deeply. Then semiotics should be enough to make sense of the problem. (For those having much time, you might read my chapter in Ontology and the Lexicon, Cambridge University Press).

Semiotics assumes three aspects/roles of a "sign" relation that is instantiated everytime there is an ongoing linguistic (or generally semiotic) activity: the *expression*, the *meaning*, and the *reference*. Any linguistic activity involves the use of some expression that typically gets a meaning in context, and usually denotes a referenced individual or collection.

A lot of different things can play one of the three roles: I can use the string "buy" as an expression in the utterance "would I buy it again?" or in "<buy> has three letters", or as a meaning in the discourse: "what does it mean the word <comprare>? To buy", or even as a reference in the sentence "Buy is a dual concept to Sell".

Now, either lexicons or ontologies contain elements that nicely distribute into the roles of a semiotic sign relation. E.g. WordNet has words (expressions), word senses and synsets (meanings), and instances (references). An ontology has labels (expressions), class and property IDs (?meanings?), individuals and facts (references), as well as "formal interpretations", e.g. class or property extensions, which would be references as well in the semiotic framework.
There remain other strange beasts such as comments/glosses/definitions, either from lexicons or ontologies, which could be considered  expressions to be analyzed, or directly as paraphrastic meanings.

At this point, what is that distinguishes ontologies from lexicons? Mainly formal interpretation it seems, with all the reasoning machinery (set-theoretic, model-theoretic, possible worlds. etc.) that comes with it. A very important feature indeed. But if we remove that machinery for a second, ontologies are just quite structured lexicons, which is often the actual meaning of ontologies assumed by linked data people, who usually prefer the term "vocabulary" :). We all know that ontologies as controlled vocabularies is a well accepted meaning.

My constructive proposal is that a standard that accommodates for any task that aggregates lexical and ontological knowledge should be able to express *both* the similarities and the differences between lexicons and ontologies.

With the thread example, the synset wordnet3:synset-buy-verb-1 can be a meaning as it is e.g. the OWL class http://ontosem.org/#buy. It can happen to reason with wordnet3:synset-buy-verb-1 as a class if the case requires it (e.g. if a WSD is used for ontology learning), as it can happen to reason with http://ontosem.org/#buy as a word sense if a different case requires it (e.g. if ontology designers discuss the actual meaning of the http://ontosem.org/#buy class). I provide here a concrete example of the first case.

In a recent tool developed at STLab, called Tipalo [1], we derive OWL taxonomies from Wikipedia definitions extracted from page abstracts. In doing so we apply deep parsing that create a DRT logical model from the NL definition, then we produce an OWL model from that, disambiguate class names to WordNet senses, and resolve as many individuals as possible to DBpedia. For example, if we ask Tipalo to produce an OWL model for the Wikipedia entity "Wind instrument" [2], the following definition is extracted:

"A wind instrument is a musical instrument that contains some type of resonator , in which a column of air is set into vibration by the player blowing into a mouthpiece set at the end of the resonator."

After the parsing is produced, Tipalo extracts the relations that are appropriate for a taxonomy, resolves some names to DBpedia entities, and disambiguates some to WordNet (by using UKB currently), so asserting owl:equivalentClass axioms between classes extracted from the logical representation of the text, and WordNet synsets. In OWL2 semantics, this makes those synsets regular classes, and formal reasoning is enabled on them.
An OWL model for the example is produced and visualized in the enclosed picture:

<Wind_instrument.png>

In my view then "direct reference" of lexical units to ontology classes is fine, provided however that both lexical units and classes can be *equally* considered semiotic meanings, and can be made interoperable by doing something as simple as what we do in Tipalo.

For Piek, notice that this solution complies with my answers to you questions: if an aspect of lexical meaning is useful, integrate it in ontology-based models/reasoning, if new meanings are needed/discovered, just integrate them..

For Guido, the game we play in Senso Comune is a bit special, because by "ontology" we really mean a fully axiomatized foundational ontology, and of course we want to be careful in distinguishing meaning coming from a dictionary like De Mauro's and meaning coming from DOLCE. However, the ground similarity between those meanings is there, and nothing prevents us (in principle) to introduce in DOLCE a meaning derived from a dictionary word sense. Such provenance distinctions about authoritativeness, formal axiomatization etc. can be preserved by adding some punning to classes and properties :).

Ehm, now my message has grown substantially, but time ago I had promised to clarify my semiotics.owl pattern, so this is a way of doing it.

Ciao
Aldo


[1] http://wit.istc.cnr.it/stlab-tools/tipalo

[2] http://en.wikipedia.org/wiki/Wind_instrument





On 11 Aug 2012, at 09:33, Guido Vetere wrote:

Piek Vossen <piek.vossen@vu.nl<mailto:piek.vossen@vu.nl>> wrote on 10/08/2012 10.05.38:

> Piek Vossen <piek.vossen@vu.nl<mailto:piek.vossen@vu.nl>>
> 10/08/2012 10.05
>
> To
>
> Guido Vetere/Italy/IBM@IBMIT
>
> cc
>
> <public-ontolex@w3.org<mailto:public-ontolex@w3.org>>
>
> Subject
>
> Re: Meaning and Semiotics - Issues for Modelling
>
> Dear all,
>
> I would like to discuss this at another level. We should first
> answer the question:
>
> 1. Is there any semantic aspect of a word sense (I prefer lexical
> unit) that cannot be represented in an ontological model?
>
> It may not be easy but I think you can, if you allow semantics in
> the ontology that incorporates probabilities and prototypicality.
> I think that any formalization of lexical meaning can be turned into
> an ontological meaning, simply because it is a formalization.
> if it is not a formalization then the lexical meaning is ill-defined
> and we need to do more (empirical) work to learn about the word and its usage.
>

As far as we can formalize lexical meanings, we can represent them in a formal way, this is true (by definition). But what we can formalize, and how, is a very open issue in philosophy of language and logic, respectively. Frege and Tarski warned about using formal logic for modeling natural language, in vain. As a matter of facts, modern logicians are still striving to look at linguistic phenomena under the lens of Truth, which is quite problematic in many cases. In fact, we lack of a generally agreed (and positive) 'theory of meaning', and I'm afraid this is not a just a problem of 'empirical work'. Of course, we cannot solve philosophycal puzzles here, but I think that we should take them into account, somehow.

> 2. Do you want to model any semantic aspect that characterizes a
> word sense also in the ontology?
>
> This is another question. If we want to model pure logical
> reasoning, there may be many lexical aspects (not just the pragmatic
> knowledge) that we do not need
> in the ontology. We do not need to represent “buy” and “sell”
> separately to reason over de financial transaction process.
>

I agree, for most computational tasks, there would be no need of representing any semantic aspect of a word sense, even if it were possible.

> 3. What do we do with the situations that lexicons are far more
> richer than any ontology available and thus we cannot provide
> sufficient ontological labels to model the lexicons.
>

> This is a more practical and pragmatic question. If the lexicon is
> so large, complex and rich, why not use a two-layered solution where
> lexical relations take the burden off the ontology and the ontology
> takes the burden of deeper reasoning (need to define how deep we
> need to go). So in the lexicon, I can say that one word is the
> informal word for “eat” and another word is the neutral label for
> “eat”.. In the ontology, we just have “eat”. Many lexicalized
> concepts are either pragmatic variants or can be defined using intersecting

> properties as described by Philipp for “bald”.
>

I like this idea of the 'two layers' very much: ontology should allow reasoning on real world structures (e.g. parts, phases, ect) while lexica should account for linguistic habits and games. By the way, Quine drew a line to distinguish 'ontology' (what is there) from 'ideology' (the way we conceptualize it through language). Maybe we can start from there ..

Regards,

Guido Vetere
Manager, Center for Advanced Studies IBM Italia
_________________________________________________
Rome                                     Trento
Via Sciangai 53                       Via Sommarive 18
00144 Roma, Italy                   38123 Povo in Trento, Italy
+39 (0)6 59662137<tel:%2B39%20%280%296%2059662137>                 +39 (0)461 312312<tel:%2B39%20%280%29461%20312312>

Mobile: +39 3357454658<tel:%2B39%203357454658>
_________________________________________________

IBM Italia S.p.A.
Sede Legale: Circonvallazione Idroscalo - 20090 Segrate (MI)
Cap. Soc. euro 347.256.998,80
C. F. e Reg. Imprese MI 01442240030 - Partita IVA 10914660153
Società con unico azionista
Società soggetta all’attività di direzione e coordinamento di International Business Machines Corporation

(Salvo che sia diversamente indicato sopra / Unless stated otherwise above)


Aldo Gangemi
Senior Researcher
Semantic Technology Lab (STLab)
Institute for Cognitive Science and Technology,
National Research Council (ISTC-CNR)
Via Nomentana 56, 00161, Roma, Italy
Tel: +390644161535<tel:%2B390644161535>
Fax: +390644161513<tel:%2B390644161513>
aldo.gangemi@cnr.it<mailto:aldo.gangemi@cnr.it>
http://www.stlab.istc.cnr.it<http://www.stlab.istc.cnr.it/>
http://www.istc.cnr.it/people/aldo-gangemi

skype aldogangemi
okkam ID: http://www.okkam.org/entity/ok200707031186131660596



Piek Vossen
Professor Computational Lexicology

[cid:image001.png@01CD7F96.E45B8F00]

T +31 (0)20 59 86457<tel:%2B31%20%280%2920%2059%2086457> |  piek.vossen@vu.nl<mailto:piek.vossen@vu.nl> | http://www.vossen.info |
ADDRESS: de Boelelaan 1105, 1081 HV Amsterdam, The Netherlands | Disclaimer<http://www.vu.nl/nl/over-de-vu/vu-website/e-mail-disclaimer/disclaimer-tekst-e-mail/index.asp>
Attachments

image/png attachment: image001.png
Received on Tuesday, 21 August 2012 10:44:53 UTC