- From: John P. McCrae <jmccrae@cit-ec.uni-bielefeld.de>
- Date: Sat, 7 Jun 2014 00:21:17 +0200
- To: Armando Stellato <stellato@info.uniroma2.it>
- Cc: Manuel Fiorelli <fiorelli@info.uniroma2.it>, public-ontolex <public-ontolex@w3.org>
- Message-ID: <CAC5njqrOx-pxFh7Vz3aL6owat0cCKfu_9gHTA+3=Ob9wah90yw@mail.gmail.com>
On Fri, Jun 6, 2014 at 10:32 PM, Armando Stellato <stellato@info.uniroma2.it > wrote: > Hi John > > > The key idea behind the concept of void:Dataset is to provide metadata > that provide useful information about the actual data they refer to. In a > sense, a void:Dataset should provide information that help to understand > the usefulness of the data, to interpret the data, and so on. > > > OK, so my question is then which triples belong to which section, if I > have something typical like > > > :know a ontolex:LexicalEntry ; > > ontolex:sense :know#Sense ; > > ontolex:canonicalForm :know#Form . > > > :know#Form ontolex:writtenRep "know"@eng > > > :know#Sense ontolex:reference foaf:knows > > > What is the lexicon and what is the lexicalization? > > *Armando*: whenever you have an attachment to the ontology, then that > part (the sense) is part of the lexicalization. If you had WordNet instead, > the synsets, which are not domain concepts, but lexical units of meaning > (ontolex:LexicalConcept) would be part of the Lexicon, and so the senses > betweem them and lexical entries. In that case, if you link wn:synsets to > the ontology, you would have a LexicalLinkSet. If you still use wordnet > words, but you create specific senses linking to the ontology, those links > are the lexicalization. If you re-use wn:senses (not sure you want to do > it, btw), those links between the wn:senses and the ontology realize the > lexicalization. > > > Furthermore, if I add something from the synsem module, e.g., > > > :know synsem:synBehavior :know#Frame . > > > :know#Frame synsem:synArg :know#arg1 , :know#arg 2. > > > :know#Sense synsem:subjOfProp :know#arg1 ; > > synsem:objOfProp :know#arg2 . > > > Where does this belong? > > > sorry, have to get familiar with this module before replying, and now it's > 3:52AM here :D > OK, but we will have to have a clear implementable distinction when we release the model. What you say makes sense but is too vague, and I am not confident it will apply well when unexpected use cases appear (as they always do). > > Furthermore, if I publish my data (ontology and lexicon) as a single file, > then it makes it difficult for an end user to figure out which bit is > which. VoID is much simpler and says that my dataset is described by either > a SPARQL endpoint, a data dump, a root resource or a URI lookup; this seems > hard to implement for tightly integrated ontology-lexica. > > *Armando*: erm...by first, void is not that much simple: linksets follwo > the same approach, their are conceptually separated, but usually part of > the same physical dataset (not its void proxy). In void: there is no "your > dataset", as your sparql endpoint provides access to a dataset ("your", > ok), which may be the combination of various datasets (including linksets), > and be linked to other datasets, which you have to (minimally, in this > case) describe as well. > True, but void is supposed to be a proxy for the physical datasets in the most part. Subsets are allowed (DBpedia has many for example) but generally they are based on some clear distinction, whereas most ontology-lexicon consist of multiple connected layers... I find the separation into individual datasets to be quite unnatural and difficult. > > > > - What is a "conceptualized linguistic resource"? This is not really > clear to me. > > Not sure about the name, but the idea was to refer to any resource like > WordNet: that is a resource providing lexical concepts grouping > semantically close senses of different words. > > I agree the term is not great. Perhaps we don't need to say this, I'm not > sure what the value in having this class is. > > *Armando*: would cross-check it with Manuel, but think we could drop it > (modulo the discussion between you and Philipp, but that is another story). > Anyway, agree that the name is totally temporary IFF the class had to be > kept. > > > - How does a "lexical linkset" differ from a "linkset"? (i.e, do we > need this class?) > > It is a specialization, that seemed useful to us, to highlight the > "special nature" of the dataset for which we are providing links. > > > OK, the seems kind of unnecessary, perhaps we should consider removing > this too. > > > *Armando*: Here I strongly disagree, or better, not 100% sure if we have > to express "that thing" in this way, but, it seems we should maintain that > idea for coehrency with the rest of the model...let's look at the principle. > > in the core model, I asked to introduce ontolex:LexicalConcept, though > said I was myself not 100% adamant in defending its introduction, as maybe > it was not saying that more over skos:Concept, except I wanted it to "tag" > concepts which were really thought as units of meaning for lexical > resources (so, the result of a creative activity where the starting point > are words, and then the creator wants to give meaning to them, and these > concepts may have very fine granularities, in that they are not bound to > simplifications which may be preferred in a thesaurus, but to intentions of > even slight semantic inflections of the words). Initially it had been > criticized, then it seems somehow convinced the group of its sense. > > Well, then, to apply again a sort of "comparison theorem": they are at > least as useful (in their respective domain, that is, metadata) as the > LexicalConcept in the core model. Now, if you think LexicalConcept are > useless and want to go back on revising the core and remove them from the > model, ok. Otherwise I really dont' see why we should hide this aspect > under the carpet. > I think my point is, how does a lexical linkset differ from a linkset? It seems soley interesting in that one or both sides of the linkset is a lexical resource... to illustrate with an absurd example if I linked a fishing ontology to geo-ontology, I would not define it as a GeoFishingLinkSet, so why do I care that a link set is lexical? I am not trying to be adversary, it is just not clear to me at all. > > > > - Shouldn't there be an object property linking a lexicalization to an > ontology? > > It is lexicalizedDataset. In our parlance, we refer to dataset to embrace > both factual knowledge and domain descriptions. > > > Why not just call the property *ontology* then? This is the onto-lex > group, a lexicalization is between an ontology and a lexicon. > > > *Armando*: Erm...the problem is that the "onto" part of ontolex is > ambiguous. ontology may mean very different things, but moving inside w3c > standards, I would avoid to tell ontologies comprise also skos thesauri. > Now, letting the whole thing be called for ontolex for histtorical things > may be right, but this should not affect the precision of our terminology > wrt existing one. > Hmm... I still like to hope we are really dealing with ontologies in this group... I just find lexicalizedDataset to be quite confusing as a name. I think we can stay with *ontology *even if we allow a fairly wide definition of what an ontology actually is. > > > - How do you count lexicalizations? i.e., is it the number of > Lexicalization instances or the number of lexicalized reference/entry pairs. > > There is a slight ambiguity with regard to this. A Lexicalization is > really a collection of reference/entry pairs, which are individually > referred to as lexicalizations (uncapitalized initial). > > If this ambiguity is unacceptable, we could consider alternative names for > the Lexicalization class. Perhaps, LexicalMapping or LexicoSemanticMapping, > or whatever sensible name. > > > A reference/entry pair in the OntoLex model is called a Lexical Sense! So > the lexicalizations and the senses property must count the same thing, > right? > > *Armando*: left ;) see our section 6 and the email where we asked to vote > this (replied affirmatively by Philipp). In any case (modulo ambiguities in > the meaning of "reference", where here Manuel meant it to be the > ontological element being boudn to the lexical entry) your statement is > incorrect wrt the core module. > What if I have two senses binding the same lex entry/onto resource? the > count is then different. > Unless we say this cannot happen (but AFAIGI, last time everybody agreed > it can) > Good point, but I am not sure the case of same reference/same entry is so common that we need to note it... my fear is that it creates a lot of ambiguity when for 90% of models these values are the same. The question is, is it worth it for this one corner case? > > > > > - What are the domains of the properties lexicalEntries, senses, > references, etc.? > > In the owl file you should have the following information: > > - lexicalEntries -> Lexicalization or ResourceCoverage or Lexicon > - senses -> Lexicalization or ResourceCoverage or Lexicon > - lexicalizations -> Lexicalization or ResourceCoverage > - references -> Lexicalization or ResourceCoverage > > So.. follow up question: If I can put the lexical entry count on the > lexicalization object, what is the point of the resource coverage object? > > *Armando*: uhm...not sure. Manuel did you put it to mean that a > Lexicalization can also provide the total number of lexicalizations used by > it? mmm, that would make sense. Think this is partially related to the > ratio/integer. Independently of the coverage (and there can be many > coverages, specifying coverage of various classes), it may be useful to > provide the total number of lexicalizations used in a Lexicalization > dataset. Obviously, if we dont use ratio, that number would be really > equivalent to the number of lexs used in a ResourceCoverage with > resource=owl:Thing. > OK, but there should be some property that distinguishes a resource coverage from a lexicalization and moreover it should be possible to have more than one ResourceCoverage, otherwise there is no need for an extra node. Regards, John > > > - Shouldn't we also count LexicalConcepts and Forms? > > As I wrote in the previous email, we are open to suggestions about > additional statistics. > > OK consider it suggested > > *Armando*: +1 > > Warmest regards, > > Armando >
Received on Friday, 6 June 2014 22:21:45 UTC