Hi all, I try a contribution from holidays.
I appreciate Piek's attempt to distinguish three aspects of the ontology-lexicon modeling. My answers are (1) that any semantic aspect of a lexical unit can be formalized (either in crisp or fuzzy logics), (2) that modeling any of those aspects can be useful in at least some task, and (3) that if we get more concepts from a lexicon, we should be entitled to reuse them in order to evolve ontologies.
Therefore, I am totally liberal about all flows from lexicons to ontologies. I am also ready to any procedure to derive lexicons from (impoverished) ontologies.
The reason why I am so liberal, is that I fully accept the consequences of adopting semiotics, the only theory that is expressive enough to consider linguistic and logical semantics as special cases. Let me explain this as briefly as possible.
The core problem we are discussing is the actual nature of meaning as it is represented in NL/lexicons or in ontologies. This problem goes back deeply in philosophy, linguistics and logic, as Guido said. However, the Semantic Web does not claim to go so deeply. Then semiotics should be enough to make sense of the problem. (For those having much time, you might read my chapter in Ontology and the Lexicon, Cambridge University Press).
Semiotics assumes three aspects/roles of a "sign" relation that is instantiated everytime there is an ongoing linguistic (or generally semiotic) activity: the *expression*, the *meaning*, and the *reference*. Any linguistic activity involves the use of some expression that typically gets a meaning in context, and usually denotes a referenced individual or collection.
A lot of different things can play one of the three roles: I can use the string "buy" as an expression in the utterance "would I buy it again?" or in "<buy> has three letters", or as a meaning in the discourse: "what does it mean the word <comprare>? To buy", or even as a reference in the sentence "Buy is a dual concept to Sell".
Now, either lexicons or ontologies contain elements that nicely distribute into the roles of a semiotic sign relation. E.g. WordNet has words (expressions), word senses and synsets (meanings), and instances (references). An ontology has labels (expressions), class and property IDs (?meanings?), individuals and facts (references), as well as "formal interpretations", e.g. class or property extensions, which would be references as well in the semiotic framework.
There remain other strange beasts such as comments/glosses/definitions, either from lexicons or ontologies, which could be considered expressions to be analyzed, or directly as paraphrastic meanings.
At this point, what is that distinguishes ontologies from lexicons? Mainly formal interpretation it seems, with all the reasoning machinery (set-theoretic, model-theoretic, possible worlds. etc.) that comes with it. A very important feature indeed. But if we remove that machinery for a second, ontologies are just quite structured lexicons, which is often the actual meaning of ontologies assumed by linked data people, who usually prefer the term "vocabulary" :). We all know that ontologies as controlled vocabularies is a well accepted meaning.
My constructive proposal is that a standard that accommodates for any task that aggregates lexical and ontological knowledge should be able to express *both* the similarities and the differences between lexicons and ontologies.
With the thread example, the synset wordnet3:synset-buy-verb-1 can be a meaning as it is e.g. the OWL class
http://ontosem.org/#buy. It can happen to reason with wordnet3:synset-buy-verb-1 as a class if the case requires it (e.g. if a WSD is used for ontology learning), as it can happen to reason with
http://ontosem.org/#buy as a word sense if a different case requires it (e.g. if ontology designers discuss the actual meaning of the
http://ontosem.org/#buy class). I provide here a concrete example of the first case.
In a recent tool developed at STLab, called Tipalo [1], we derive OWL taxonomies from Wikipedia definitions extracted from page abstracts. In doing so we apply deep parsing that create a DRT logical model from the NL definition, then we produce an OWL model from that, disambiguate class names to WordNet senses, and resolve as many individuals as possible to DBpedia. For example, if we ask Tipalo to produce an OWL model for the Wikipedia entity "Wind instrument" [2], the following definition is extracted:
"A wind instrument is a musical instrument that contains some type of resonator , in which a column of air is set into vibration by the player blowing into a mouthpiece set at the end of the resonator."
After the parsing is produced, Tipalo extracts the relations that are appropriate for a taxonomy, resolves some names to DBpedia entities, and disambiguates some to WordNet (by using UKB currently), so asserting owl:equivalentClass axioms between classes extracted from the logical representation of the text, and WordNet synsets. In OWL2 semantics, this makes those synsets regular classes, and formal reasoning is enabled on them.
An OWL model for the example is produced and visualized in the enclosed picture:
<Wind_instrument.png>
In my view then "direct reference" of lexical units to ontology classes is fine, provided however that both lexical units and classes can be *equally* considered semiotic meanings, and can be made interoperable by doing something as simple as what we do in Tipalo.
For Piek, notice that this solution complies with my answers to you questions: if an aspect of lexical meaning is useful, integrate it in ontology-based models/reasoning, if new meanings are needed/discovered, just integrate them.
For Guido, the game we play in Senso Comune is a bit special, because by "ontology" we really mean a fully axiomatized foundational ontology, and of course we want to be careful in distinguishing meaning coming from a dictionary like De Mauro's and meaning coming from DOLCE. However, the ground similarity between those meanings is there, and nothing prevents us (in principle) to introduce in DOLCE a meaning derived from a dictionary word sense. Such provenance distinctions about authoritativeness, formal axiomatization etc. can be preserved by adding some punning to classes and properties :).
Ehm, now my message has grown substantially, but time ago I had promised to clarify my semiotics.owl pattern, so this is a way of doing it.
Ciao
Aldo
On 11 Aug 2012, at 09:33, Guido Vetere wrote:
Piek Vossen <piek.vossen@vu.nl> wrote on 10/08/2012
10.05.38:
> Piek Vossen <piek.vossen@vu.nl>
> 10/08/2012 10.05
>
> To
>
> Guido Vetere/Italy/IBM@IBMIT
>
> cc
>
> <public-ontolex@w3.org>
>
> Subject
>
> Re: Meaning and Semiotics - Issues for Modelling
>
> Dear all,
>
> I would like to discuss this at another level. We should first
> answer the question:
>
> 1. Is there any semantic aspect of a word sense (I prefer lexical
> unit) that cannot be represented in an ontological model?
>
> It may not be easy but I think you can, if you allow semantics in
> the ontology that incorporates probabilities and prototypicality.
> I think that any formalization of lexical meaning
can be turned into
> an ontological meaning, simply because it is a formalization.
> if it is not a formalization then the lexical
meaning is ill-defined
> and we need to do more (empirical) work to learn about the word and
its usage.
>
As far as we can formalize lexical meanings, we can
represent them in a formal way, this is true (by definition). But what
we can formalize, and how, is a very open issue in philosophy of language
and logic, respectively. Frege and Tarski warned about using formal logic
for modeling natural language, in vain. As a matter of facts, modern logicians
are still striving to look at linguistic phenomena under the lens of Truth,
which is quite problematic in many cases. In fact, we lack of a generally
agreed (and positive) 'theory of meaning', and I'm afraid this is not a
just a problem of 'empirical work'. Of course, we cannot solve philosophycal
puzzles here, but I think that we should take them into account, somehow.
> 2. Do you want to model any semantic aspect that characterizes a
> word sense also in the ontology?
>
> This is another question. If we want to model pure logical
> reasoning, there may be many lexical aspects (not just the pragmatic
> knowledge) that we do not need
> in the ontology. We do not need to represent
“buy” and “sell”
> separately to reason over de financial transaction process.
>
I agree, for most computational tasks, there would
be no need of representing any semantic aspect of a word sense, even if
it were possible.
> 3. What do we do with the situations that lexicons are far more
> richer than any ontology available and thus we cannot provide
> sufficient ontological labels to model the lexicons.
>
> This is a more practical and pragmatic question. If the lexicon is
> so large, complex and rich, why not use a two-layered solution where
> lexical relations take the burden off the ontology and the ontology
> takes the burden of deeper reasoning (need to define how deep we
> need to go). So in the lexicon, I can say that one word is the
> informal word for “eat” and another word is the neutral label for
> “eat”. In the ontology, we just have “eat”. Many lexicalized
> concepts are either pragmatic variants or can be defined using intersecting
> properties as described by Philipp for “bald”.
>
I like this idea of the 'two layers' very much: ontology
should allow reasoning on real world structures (e.g. parts, phases, ect)
while lexica should account for linguistic habits and games. By the way,
Quine drew a line to distinguish 'ontology' (what is there) from 'ideology'
(the way we conceptualize it through language). Maybe we can start from
there ..
Regards,
Guido Vetere
Manager, Center for Advanced Studies IBM Italia
_________________________________________________
Rome
Trento
Via Sciangai 53
Via Sommarive 18
00144 Roma, Italy
38123 Povo in Trento, Italy
+39 (0)6 59662137
+39 (0)461 312312
Mobile: +39 3357454658
_________________________________________________
IBM Italia S.p.A.
Sede Legale: Circonvallazione Idroscalo - 20090 Segrate (MI)
Cap. Soc. euro 347.256.998,80
C. F. e Reg. Imprese MI 01442240030 - Partita IVA 10914660153
Società con unico azionista
Società soggetta all’attività di direzione e coordinamento di International
Business Machines Corporation
(Salvo che sia diversamente indicato sopra / Unless stated otherwise above)
Aldo Gangemi
Senior Researcher
Semantic Technology Lab (STLab)
Institute for Cognitive Science and Technology,
National Research Council (ISTC-CNR)
Via Nomentana 56, 00161, Roma, Italy
Tel: +390644161535
Fax: +390644161513
skype aldogangemi