Re: Comments on lime.owl

On Fri, Jun 6, 2014 at 10:32 PM, Armando Stellato <stellato@info.uniroma2.it
> wrote:

> Hi John
>
>
> The key idea behind the concept of void:Dataset is to provide metadata
> that provide useful information about the actual data they refer to. In a
> sense, a void:Dataset should provide information that help to understand
> the usefulness of the data, to interpret the data, and so on.
>
>
> OK, so my question is then which triples belong to which section, if I
> have something typical like
>
>
> :know a ontolex:LexicalEntry ;
>
>   ontolex:sense :know#Sense ;
>
>   ontolex:canonicalForm :know#Form .
>
>
> :know#Form ontolex:writtenRep "know"@eng
>
>
> :know#Sense ontolex:reference foaf:knows
>
>
> What is the lexicon and what is the lexicalization?
>
> *Armando*: whenever you have an attachment to the ontology, then that
> part (the sense) is part of the lexicalization. If you had WordNet instead,
> the synsets, which are not domain concepts, but lexical units of meaning
> (ontolex:LexicalConcept) would be part of the Lexicon, and so the senses
> betweem them and lexical entries. In that case, if you link wn:synsets to
> the ontology, you would have a LexicalLinkSet. If you still use wordnet
> words, but you create specific senses linking to the ontology, those links
> are the lexicalization. If you re-use wn:senses (not sure you want to do
> it, btw), those links between the wn:senses and the ontology realize the
> lexicalization.
>
>
> Furthermore, if I add something from the synsem module, e.g.,
>
>
> :know synsem:synBehavior :know#Frame .
>
>
> :know#Frame synsem:synArg :know#arg1 , :know#arg 2.
>
>
> :know#Sense synsem:subjOfProp :know#arg1 ;
>
>   synsem:objOfProp :know#arg2 .
>
>
> Where does this belong?
>
>
> sorry, have to get familiar with this module before replying, and now it's
> 3:52AM here :D
>
OK, but we will have to have a clear implementable distinction when we
release the model. What you say makes sense but is too vague, and I am not
confident it will apply well when unexpected use cases appear (as they
always do).

>
> Furthermore, if I publish my data (ontology and lexicon) as a single file,
> then it makes it difficult for an end user to figure out which bit is
> which. VoID is much simpler and says that my dataset is described by either
> a SPARQL endpoint, a data dump, a root resource or a URI lookup; this seems
> hard to implement for tightly integrated ontology-lexica.
>
> *Armando*:  erm...by first, void is not that much simple: linksets follwo
> the same approach, their are conceptually separated, but usually part of
> the same physical dataset (not its void proxy). In void: there is no "your
> dataset", as your sparql endpoint provides access to a dataset ("your",
> ok), which may be the combination of various datasets (including linksets),
> and be linked to other datasets, which you have to (minimally, in this
> case) describe as well.
>
True, but void is supposed to be a proxy for the physical datasets in the
most part. Subsets are allowed (DBpedia has many for example) but generally
they are based on some clear distinction, whereas most ontology-lexicon
consist of multiple connected layers... I find the separation into
individual datasets to be quite unnatural and difficult.

>
>
>
>    - What is a "conceptualized linguistic resource"? This is not really
>    clear to me.
>
> Not sure about the name, but the idea was to refer to any resource like
> WordNet: that is a resource providing lexical concepts grouping
> semantically close senses of different words.
>
> I agree the term is not great. Perhaps we don't need to say this, I'm not
> sure what the value in having this class is.
>
> *Armando*: would cross-check it with Manuel, but think we could drop it
> (modulo the discussion between you and Philipp, but that is another story).
> Anyway, agree that the name is totally temporary IFF the class had to be
> kept.
>
>
>    - How does a "lexical linkset" differ from a "linkset"? (i.e, do we
>    need this class?)
>
> It is a specialization, that seemed useful to us, to highlight the
> "special nature" of the dataset for which we are providing links.
>
>
> OK, the seems kind of unnecessary, perhaps we should consider removing
> this too.
>
>
> *Armando*: Here I strongly disagree, or better, not 100% sure if we have
> to express "that thing" in this way, but, it seems we should maintain that
> idea for coehrency with the rest of the model...let's look at the principle.
>
> in the core model, I asked to introduce ontolex:LexicalConcept, though
> said I was myself not 100% adamant in defending its introduction, as maybe
> it was not saying that more over skos:Concept, except I wanted it to "tag"
> concepts which were really thought as units of meaning for lexical
> resources (so, the result of a creative activity where the starting point
> are words, and then the creator wants to give meaning to them, and these
> concepts may have very fine granularities, in that they are not bound to
> simplifications which may be preferred in a thesaurus, but to intentions of
> even slight semantic inflections of the words). Initially it had been
> criticized, then it seems somehow convinced the group of its sense.
>
> Well, then, to apply again a sort of "comparison theorem": they are at
> least as useful (in their respective domain, that is, metadata) as the
> LexicalConcept in the core model. Now, if you think LexicalConcept are
> useless and want to go back on revising the core and remove them from the
> model, ok. Otherwise I really dont' see why we should hide this aspect
> under the carpet.
>
I think my point is, how does a lexical linkset differ from a linkset? It
seems soley interesting in that one or both sides of the linkset is a
lexical resource... to illustrate with an absurd example if I linked a
fishing ontology to geo-ontology, I would not define it as a
GeoFishingLinkSet, so why do I care that a link set is lexical? I am not
trying to be adversary, it is just not clear to me at all.

>
>
>
>    - Shouldn't there be an object property linking a lexicalization to an
>    ontology?
>
> It is lexicalizedDataset. In our parlance, we refer to dataset to embrace
> both factual knowledge and domain descriptions.
>
>
> Why not just call the property *ontology* then? This is the onto-lex
> group, a lexicalization is between an ontology and a lexicon.
>
>
> *Armando*: Erm...the problem is that the "onto" part of ontolex is
> ambiguous. ontology may mean very different things, but moving inside w3c
> standards, I would avoid to tell ontologies comprise also skos thesauri.
> Now, letting the whole thing be called for ontolex for histtorical things
> may be right, but this should not affect the precision of our terminology
> wrt existing one.
>
Hmm... I still like to hope we are really dealing with ontologies in this
group... I just find lexicalizedDataset to be quite confusing as a name. I
think we can stay with *ontology *even if we allow a fairly wide definition
of what an ontology actually is.

>
>
>    - How do you count lexicalizations? i.e., is it the number of
>    Lexicalization instances or the number of lexicalized reference/entry pairs.
>
> There is a slight ambiguity with regard to this. A Lexicalization is
> really a collection of reference/entry pairs, which are individually
> referred to as lexicalizations (uncapitalized initial).
>
> If this ambiguity is unacceptable, we could consider alternative names for
> the Lexicalization class. Perhaps, LexicalMapping or LexicoSemanticMapping,
> or whatever sensible name.
>
>
> A reference/entry pair in the OntoLex model is called a Lexical Sense! So
> the lexicalizations and the senses property must count the same thing,
> right?
>
> *Armando*: left ;) see our section 6 and the email where we asked to vote
> this (replied affirmatively by Philipp). In any case (modulo ambiguities in
> the meaning of "reference", where here Manuel meant it to be the
> ontological element being boudn to the lexical entry) your statement is
> incorrect wrt the core module.
> What if I have two senses binding the same lex entry/onto resource? the
> count is then different.
> Unless we say this cannot happen (but AFAIGI, last time everybody agreed
> it can)
>
Good point, but I am not sure the case of same reference/same entry is so
common that we need to note it... my fear is that it creates a lot of
ambiguity when for 90% of models these values are the same. The question
is, is it worth it for this one corner case?

>
>
>
>
>    - What are the domains of the properties lexicalEntries, senses,
>    references, etc.?
>
> In the owl file you should have the following information:
>
>    - lexicalEntries -> Lexicalization or ResourceCoverage or Lexicon
>    - senses -> Lexicalization or ResourceCoverage or Lexicon
>    - lexicalizations -> Lexicalization or ResourceCoverage
>    - references -> Lexicalization or ResourceCoverage
>
> So.. follow up question: If I can put the lexical entry count on the
> lexicalization object, what is the point of the resource coverage object?
>
> *Armando*: uhm...not sure. Manuel did you put it to mean that a
> Lexicalization can also provide the total number of lexicalizations used by
> it? mmm, that would make sense. Think this is partially related to the
> ratio/integer. Independently of the coverage (and there can be many
> coverages, specifying coverage of various classes), it may be useful to
> provide the total number of lexicalizations used in a Lexicalization
> dataset. Obviously, if we dont use ratio, that number would be really
> equivalent to the number of lexs used in a ResourceCoverage with
> resource=owl:Thing.
>
OK, but there should be some property that distinguishes a resource
coverage from a lexicalization and moreover it should be possible to have
more than one ResourceCoverage, otherwise there is no need for an extra
node.

Regards,
John

>
>
>    - Shouldn't we also count LexicalConcepts and Forms?
>
> As I wrote in the previous email, we are open to suggestions about
> additional statistics.
>
> OK consider it suggested
>
> *Armando*: +1
>
> Warmest regards,
>
> Armando
>

Received on Friday, 6 June 2014 22:21:45 UTC