RE: Comments on lime.owl from Armando Stellato on 2014-06-06 (public-ontolex@w3.org from June 2014)

From: Armando Stellato <stellato@info.uniroma2.it>
Date: Fri, 6 Jun 2014 22:42:29 +0200
To: John McCrae <jmccrae@cit-ec.uni-bielefeld.de>, Manuel Fiorelli <fiorelli@info.uniroma2.it>
CC: public-ontolex <public-ontolex@w3.org>
Message-ID: <DUB124-W384AAF001B325849EC772FA02C0@phx.gbl>
oh, sorry,
forgot to complete the part about the difficulties with the three-datasets structure.As I said, by first, void is not that simple as you depict it (see previous email), even when depicting simple datasets, and by second, you must consider the inherent complexity of the Ontolex core module which needs then to be "metadataed".
With great complexity, comes great responsibilities, cit. Voltaire, or Spiderman's uncle, if you prefer ;)

From: stellato@info.uniroma2.it
To: jmccrae@cit-ec.uni-bielefeld.de; fiorelli@info.uniroma2.it
CC: public-ontolex@w3.org
Subject: RE: Comments on lime.owl
Date: Fri, 6 Jun 2014 22:32:02 +0200




Hi John

The key idea behind the concept of void:Dataset is to provide metadata that provide useful information about the actual data they refer to. In a sense, a void:Dataset should provide information that help to understand the usefulness of the data, to interpret the data, and so on.



OK, so my question is then which triples belong to which section, if I have something typical like
:know a ontolex:LexicalEntry ;  ontolex:sense :know#Sense ;  ontolex:canonicalForm :know#Form . 
:know#Form ontolex:writtenRep "know"@eng
:know#Sense ontolex:reference foaf:knows
What is the lexicon and what is the lexicalization?
Armando: whenever you have an attachment to the ontology, then that part (the sense) is part of the lexicalization. If you had WordNet instead, the synsets, which are not domain concepts, but lexical units of meaning (ontolex:LexicalConcept) would be part of the Lexicon, and so the senses betweem them and lexical entries. In that case, if you link wn:synsets to the ontology, you would have a LexicalLinkSet. If you still use wordnet words, but you create specific senses linking to the ontology, those links are the lexicalization. If you re-use wn:senses (not sure you want to do it, btw), those links between the wn:senses and the ontology realize the lexicalization.
Furthermore, if I add something from the synsem module, e.g.,
:know synsem:synBehavior :know#Frame .
:know#Frame synsem:synArg :know#arg1 , :know#arg 2.
:know#Sense synsem:subjOfProp :know#arg1 ;  synsem:objOfProp :know#arg2 .
Where does this belong?







sorry, have to get familiar with this module before replying, and now it's 3:52AM here :D
Furthermore, if I publish my data (ontology and lexicon) as a single file, then it makes it difficult for an end user to figure out which bit is which. VoID is much simpler and says that my dataset is described by either a SPARQL endpoint, a data dump, a root resource or a URI lookup; this seems hard to implement for tightly integrated ontology-lexica.
Armando:  erm...by first, void is not that much simple: linksets follwo the same approach, their are conceptually separated, but usually part of the same physical dataset (not its void proxy). In void: there is no "your dataset", as your sparql endpoint provides access to a dataset ("your", ok), which may be the combination of various datasets (including linksets), and be linked to other datasets, which you have to (minimally, in this case) describe as well.





What is a "conceptualized linguistic resource"? This is not really clear to me.Not sure about the name, but the idea was to refer to any resource like WordNet: that is a resource providing lexical concepts grouping semantically close senses of different words.


I agree the term is not great. Perhaps we don't need to say this, I'm not sure what the value in having this class is.
Armando: would cross-check it with Manuel, but think we could drop it (modulo the discussion between you and Philipp, but that is another story). Anyway, agree that the name is totally temporary IFF the class had to be kept.



How does a "lexical linkset" differ from a "linkset"? (i.e, do we need this class?)


It is a specialization, that seemed useful to us, to highlight the "special nature" of the dataset for which we are providing links.



OK, the seems kind of unnecessary, perhaps we should consider removing this too.
Armando: Here I strongly disagree, or better, not 100% sure if we have to express "that thing" in this way, but, it seems we should maintain that idea for coehrency with the rest of the model...let's look at the principle.
in the core model, I asked to introduce ontolex:LexicalConcept, though said I was myself not 100% adamant in defending its introduction, as maybe it was not saying that more over skos:Concept, except I wanted it to "tag" concepts which were really thought as units of meaning for lexical resources (so, the result of a creative activity where the starting point are words, and then the creator wants to give meaning to them, and these concepts may have very fine granularities, in that they are not bound to simplifications which may be preferred in a thesaurus, but to intentions of even slight semantic inflections of the words). Initially it had been criticized, then it seems somehow convinced the group of its sense.
Well, then, to apply again a sort of "comparison theorem": they are at least as useful (in their respective domain, that is, metadata) as the LexicalConcept in the core model. Now, if you think LexicalConcept are useless and want to go back on revising the core and remove them from the model, ok. Otherwise I really dont' see why we should hide this aspect under the carpet.





Shouldn't there be an object property linking a lexicalization to an ontology?It is lexicalizedDataset. In our parlance, we refer to dataset to embrace both factual knowledge and domain descriptions.


Why not just call the property ontology then? This is the onto-lex group, a lexicalization is between an ontology and a lexicon.
Armando: Erm...the problem is that the "onto" part of ontolex is ambiguous. ontology may mean very different things, but moving inside w3c standards, I would avoid to tell ontologies comprise also skos thesauri. Now, letting the whole thing be called for ontolex for histtorical things may be right, but this should not affect the precision of our terminology wrt existing one. 


 
How do you count lexicalizations? i.e., is it the number of Lexicalization instances or the number of lexicalized reference/entry pairs.There is a slight ambiguity with regard to this. A Lexicalization is really a collection of reference/entry pairs, which are individually referred to as lexicalizations (uncapitalized initial).




If this ambiguity is unacceptable, we could consider alternative names for the Lexicalization class. Perhaps, LexicalMapping or LexicoSemanticMapping, or whatever sensible name.


A reference/entry pair in the OntoLex model is called a Lexical Sense! So the lexicalizations and the senses property must count the same thing, right?
Armando: left ;) see our section 6 and the email where we asked to vote this (replied affirmatively by Philipp). In any case (modulo ambiguities in the meaning of "reference", where here Manuel meant it to be the ontological element being boudn to the lexical entry) your statement is incorrect wrt the core module.What if I have two senses binding the same lex entry/onto resource? the count is then different.Unless we say this cannot happen (but AFAIGI, last time everybody agreed it can)


 



What are the domains of the properties lexicalEntries, senses, references, etc.?In the owl file you should have the following information:
lexicalEntries -> Lexicalization or ResourceCoverage or Lexicon


senses -> Lexicalization or ResourceCoverage or Lexiconlexicalizations -> Lexicalization or ResourceCoveragereferences -> Lexicalization or ResourceCoverage

So.. follow up question: If I can put the lexical entry count on the lexicalization object, what is the point of the resource coverage object? 
Armando: uhm...not sure. Manuel did you put it to mean that a Lexicalization can also provide the total number of lexicalizations used by it? mmm, that would make sense. Think this is partially related to the ratio/integer. Independently of the coverage (and there can be many coverages, specifying coverage of various classes), it may be useful to provide the total number of lexicalizations used in a Lexicalization dataset. Obviously, if we dont use ratio, that number would be really equivalent to the number of lexs used in a ResourceCoverage with resource=owl:Thing.



Shouldn't we also count LexicalConcepts and Forms? As I wrote in the previous email, we are open to suggestions about additional statistics.
OK consider it suggested
Armando: +1
Warmest regards,
Armando
Received on Friday, 6 June 2014 20:42:59 UTC