Re: Comments on lime.owl from Philipp Cimiano on 2014-06-12 (public-ontolex@w3.org from June 2014)

From: Philipp Cimiano <cimiano@cit-ec.uni-bielefeld.de>
Date: Fri, 13 Jun 2014 00:26:40 +0200
To: public-ontolex@w3.org
Message-ID: <539A2920.2070206@cit-ec.uni-bielefeld.de>
Dear Armando, John, Manuel, all,

  I am trying hard to follow the discussion without being lost in too 
many details ;-)

Let me try to recap. I think we should forget for now about "Lexicon" 
and "Lexicalization" and clarify what we mean really and then find a name.

So, I think we are talking about packaging an ontolex model into 
different (logical) subsets.
This makes sense as one can attach different metadata to each of these 
subdatasets, which is I think the key point we are after.

Now I find the question of "what triples belong where" difficult if not 
impossible to answer. As John says a lexicon consists of many different 
layers of entries, senses, references, forms. So how people will package 
is quite arbitrary.

Answering the question what should go where is difficult thus I find as 
there is no natural way to package the triples in a lexicon.

The reason why we want to package things is to have metadata at the 
level of the complete resource (Lexicon as a collection of lex entries 
and Lexicalization as a collection of senses).

In addition, we want to group parts of the lexicon that refer to a given 
dataset/ontology to say something like how many references for this 
given ontology are lexicalized in the lexicon; fair enough. Let's accept 
this is important (which I agree actually)

So I think what we are looking for is somehting like a class 
"SubsetOfLexiconRelevantReferringToAParticularOntology" as subclass of 
void:Dataset and representing a slice of the overall dataset. For this 
class we would have a property lime:dataset or lime:ontology (I propose 
lime:vocabulary which is more neutral) that expresses the ontology in 
question and would be functional, otherwise it makes no sense. We could 
then attach the standard metadata properties capturing statistics to the 
Lexicon as a whole, to the whole dataset (comprising possibly many 
lexica) or to the subset lex / sense / ref triples with a particular 
"ref". So SubsetOfLexiconRelevantReferringToAParticularOntology(o) as a 
function would refer as a Dataset to all the lex / sense / ref triples 
here ref is in o. Fair enough. We could have this class but never say 
explicitly which triples belong to it, but keep it as some implicit 
subset of the lexicon. This would allow us to make metadata statements 
about all the sense / ref triples with ref \in o. Let me refer to these 
tuples as (lex,sense,ref) for now.

The properties we want to have for the Dataset 
SubsetOfLexiconRelevantReferringToAParticularOntology(o) are:

entries:  #{ lex : (lex,sense,ref) \in o}
senses:  #{ sense: (lex,sense,ref) \in o}
lexicalizations: # { (lex,ref) : (lex,sense,ref) \in o}
references # { ref : (lex,sense,ref) \ in o}

So a question to you all:

Is "SubsetOfLexiconRelevantForAParticularOntology" the kind of thing we 
want to have to pick out that subset of the overall dataset that refers 
to a given ontology?

If yes, we can discuss the name further, but we need to agree on the 
concept.

The other issue is that we might have lexical resources that do not 
introduce any lexical entries but only link lexical entries in one 
resource to entities in some ontology, but do not contain any lexicon 
nor lexical entries themselves.

Clearly, there might be hybrids in the general case, resources that both 
link only but also introduce some lexical entries. This might be the 
standard case.

So not sure if we want to specifically tag resources that *only* link 
but do not introduce lexical entries nor lexica themselves. We can do it 
and call these type of datasets: LemonLinkSet while we could call the 
other ones simply "LemonDataset". If people feel this is important we 
can certainly do it, but I do not yet see the added value that clearly.

Ok, so far so good. For me there is a clear picture emerging, but we 
need to agree on it ;-)
The other things are minor details that will follow from our stance 
towards the issues I mentioned above I think.

Best regards,

Philipp.

Am 07.06.14 00:21, schrieb John P. McCrae:
>
>
>
> On Fri, Jun 6, 2014 at 10:32 PM, Armando Stellato 
> <stellato@info.uniroma2.it <mailto:stellato@info.uniroma2.it>> wrote:
>
>         Hi John
>
>
>         The key idea behind the concept of void:Dataset is to provide
>         metadata that provide useful information about the actual data
>         they refer to. In a sense, a void:Dataset should provide
>         information that help to understand the usefulness of the
>         data, to interpret the data, and so on.
>
>
>             OK, so my question is then which triples belong to which
>             section, if I have something typical like
>
>
>             :know a ontolex:LexicalEntry ;
>
>             ontolex:sense :know#Sense ;
>
>             ontolex:canonicalForm :know#Form .
>
>
>             :know#Form ontolex:writtenRep "know"@eng
>
>
>             :know#Sense ontolex:reference foaf:knows
>
>
>             What is the lexicon and what is the lexicalization?
>
>     *Armando*: whenever you have an attachment to the ontology, then
>     that part (the sense) is part of the lexicalization. If you had
>     WordNet instead, the synsets, which are not domain concepts, but
>     lexical units of meaning (ontolex:LexicalConcept) would be part of
>     the Lexicon, and so the senses betweem them and lexical entries.
>     In that case, if you link wn:synsets to the ontology, you would
>     have a LexicalLinkSet. If you still use wordnet words, but you
>     create specific senses linking to the ontology, those links are
>     the lexicalization. If you re-use wn:senses (not sure you want to
>     do it, btw), those links between the wn:senses and the ontology
>     realize the lexicalization.
>
>
>             Furthermore, if I add something from the synsem module, e.g.,
>
>
>             :know synsem:synBehavior :know#Frame .
>
>
>             :know#Frame synsem:synArg :know#arg1 , :know#arg 2.
>
>
>             :know#Sense synsem:subjOfProp :know#arg1 ;
>
>               synsem:objOfProp :know#arg2 .
>
>
>             Where does this belong?
>
>
>     sorry, have to get familiar with this module before replying, and
>     now it's 3:52AM here :D
>
> OK, but we will have to have a clear implementable distinction when we 
> release the model. What you say makes sense but is too vague, and I am 
> not confident it will apply well when unexpected use cases appear (as 
> they always do).
>
>
>             Furthermore, if I publish my data (ontology and lexicon)
>             as a single file, then it makes it difficult for an end
>             user to figure out which bit is which. VoID is much
>             simpler and says that my dataset is described by either a
>             SPARQL endpoint, a data dump, a root resource or a URI
>             lookup; this seems hard to implement for tightly
>             integrated ontology-lexica.
>
>     *Armando*:  erm...by first, void is not that much simple: linksets
>     follwo the same approach, their are conceptually separated, but
>     usually part of the same physical dataset (not its void proxy). In
>     void: there is no "your dataset", as your sparql endpoint provides
>     access to a dataset ("your", ok), which may be the combination of
>     various datasets (including linksets), and be linked to other
>     datasets, which you have to (minimally, in this case) describe as
>     well.
>
> True, but void is supposed to be a proxy for the physical datasets in 
> the most part. Subsets are allowed (DBpedia has many for example) but 
> generally they are based on some clear distinction, whereas most 
> ontology-lexicon consist of multiple connected layers... I find the 
> separation into individual datasets to be quite unnatural and difficult.
>
>
>
>               * What is a "conceptualized linguistic resource"? This
>                 is not really clear to me.
>
>         Not sure about the name, but the idea was to refer to any
>         resource like WordNet: that is a resource providing lexical
>         concepts grouping semantically close senses of different words.
>
>     I agree the term is not great. Perhaps we don't need to say this,
>     I'm not sure what the value in having this class is.
>
>     *Armando*: would cross-check it with Manuel, but think we could
>     drop it (modulo the discussion between you and Philipp, but that
>     is another story). Anyway, agree that the name is totally
>     temporary IFF the class had to be kept.
>
>               * How does a "lexical linkset" differ from a "linkset"?
>                 (i.e, do we need this class?)
>
>         It is a specialization, that seemed useful to us, to highlight
>         the "special nature" of the dataset for which we are providing
>         links.
>
>
>             OK, the seems kind of unnecessary, perhaps we should
>             consider removing this too.
>
>
>     *Armando*: Here I strongly disagree, or better, not 100% sure if
>     we have to express "that thing" in this way, but, it seems we
>     should maintain that idea for coehrency with the rest of the
>     model...let's look at the principle.
>
>     in the core model, I asked to introduce ontolex:LexicalConcept,
>     though said I was myself not 100% adamant in defending its
>     introduction, as maybe it was not saying that more over
>     skos:Concept, except I wanted it to "tag" concepts which were
>     really thought as units of meaning for lexical resources (so, the
>     result of a creative activity where the starting point are words,
>     and then the creator wants to give meaning to them, and these
>     concepts may have very fine granularities, in that they are not
>     bound to simplifications which may be preferred in a thesaurus,
>     but to intentions of even slight semantic inflections of the
>     words). Initially it had been criticized, then it seems somehow
>     convinced the group of its sense.
>
>     Well, then, to apply again a sort of "comparison theorem": they
>     are at least as useful(in their respective domain, that is,
>     metadata) as the LexicalConcept in the core model. Now, if you
>     think LexicalConcept are useless and want to go back on revising
>     the core and remove them from the model, ok. Otherwise I really
>     dont' see why we should hide this aspect under the carpet.
>
> I think my point is, how does a lexical linkset differ from a linkset? 
> It seems soley interesting in that one or both sides of the linkset is 
> a lexical resource... to illustrate with an absurd example if I linked 
> a fishing ontology to geo-ontology, I would not define it as a 
> GeoFishingLinkSet, so why do I care that a link set is lexical? I am 
> not trying to be adversary, it is just not clear to me at all.
>
>
>
>               * Shouldn't there be an object property linking a
>                 lexicalization to an ontology?
>
>         It is lexicalizedDataset. In our parlance, we refer to dataset
>         to embrace both factual knowledge and domain descriptions.
>
>
>             Why not just call the property /ontology/ then? This is
>             the onto-lex group, a lexicalization is between an
>             ontology and a lexicon.
>
>
>     *Armando*: Erm...the problem is that the "onto" part of ontolex is
>     ambiguous. ontology may mean very different things, but moving
>     inside w3c standards, I would avoid to tell ontologies comprise
>     also skos thesauri. Now, letting the whole thing be called for
>     ontolex for histtorical things may be right, but this should not
>     affect the precision of our terminology wrt existing one.
>
> Hmm... I still like to hope we are really dealing with ontologies in 
> this group... I just find lexicalizedDataset to be quite confusing as 
> a name. I think we can stay with /ontology /even if we allow a fairly 
> wide definition of what an ontology actually is.
>
>
>               * How do you count lexicalizations? i.e., is it the
>                 number of Lexicalization instances or the number of
>                 lexicalized reference/entry pairs.
>
>         There is a slight ambiguity with regard to this. A
>         Lexicalization is really a collection of reference/entry
>         pairs, which are individually referred to as lexicalizations
>         (uncapitalized initial).
>
>         If this ambiguity is unacceptable, we could consider
>         alternative names for the Lexicalization class. Perhaps,
>         LexicalMapping or LexicoSemanticMapping, or whatever sensible
>         name.
>
>
>         A reference/entry pair in the OntoLex model is called a
>         Lexical Sense! So the lexicalizations and the senses property
>         must count the same thing, right?
>
>     *Armando*: left ;) see our section 6 and the email where we asked
>     to vote this (replied affirmatively by Philipp). In any case
>     (modulo ambiguities in the meaning of "reference", where here
>     Manuel meant it to be the ontological element being boudn to the
>     lexical entry) your statement is incorrect wrt the core module.
>     What if I have two senses binding the same lex entry/onto
>     resource? the count is then different.
>     Unless we say this cannot happen (but AFAIGI, last time everybody
>     agreed it can)
>
> Good point, but I am not sure the case of same reference/same entry is 
> so common that we need to note it... my fear is that it creates a lot 
> of ambiguity when for 90% of models these values are the same. The 
> question is, is it worth it for this one corner case?
>
>
>               * What are the domains of the properties lexicalEntries,
>                 senses, references, etc.?
>
>         In the owl file you should have the following information:
>
>           * lexicalEntries -> Lexicalization or ResourceCoverage or
>             Lexicon
>           * senses -> Lexicalization or ResourceCoverage or Lexicon
>           * lexicalizations -> Lexicalization or ResourceCoverage
>           * references -> Lexicalization or ResourceCoverage
>
>     So.. follow up question: If I can put the lexical entry count on
>     the lexicalization object, what is the point of the resource
>     coverage object?
>
>     *Armando*: uhm...not sure. Manuel did you put it to mean that a
>     Lexicalization can also provide the total number of
>     lexicalizations used by it? mmm, that would make sense. Think this
>     is partially related to the ratio/integer. Independently of the
>     coverage (and there can be many coverages, specifying coverage of
>     various classes), it may be useful to provide the total number of
>     lexicalizations used in a Lexicalization dataset. Obviously, if we
>     dont use ratio, that number would be really equivalent to the
>     number of lexs used in a ResourceCoverage with resource=owl:Thing.
>
> OK, but there should be some property that distinguishes a resource 
> coverage from a lexicalization and moreover it should be possible to 
> have more than one ResourceCoverage, otherwise there is no need for an 
> extra node.
>
> Regards,
> John
>
>
>               * Shouldn't we also count LexicalConcepts and Forms?
>
>         As I wrote in the previous email, we are open to suggestions
>         about additional statistics.
>
>     OK consider it suggested
>
>     *Armando*: +1
>
>     Warmest regards,
>
>     Armando
>
>


-- 

Prof. Dr. Philipp Cimiano

Phone: +49 521 106 12249
Fax: +49 521 106 12412
Mail: cimiano@cit-ec.uni-bielefeld.de

Forschungsbau Intelligente Systeme (FBIIS)
Raum 2.307
Universität Bielefeld
Inspiration 1
33619 Bielefeld
Received on Thursday, 12 June 2014 22:27:12 UTC