Re: Comments on lime.owl from John P. McCrae on 2014-06-06 (public-ontolex@w3.org from June 2014)

From: John P. McCrae <jmccrae@cit-ec.uni-bielefeld.de>
Date: Fri, 6 Jun 2014 20:16:23 +0200
To: Manuel Fiorelli <fiorelli@info.uniroma2.it>
Cc: public-ontolex <public-ontolex@w3.org>
Message-ID: <CAC5njqrqyLL41eV-BKvd1qkCxj0AtFtVn+apeM9u2fqGseDRgg@mail.gmail.com>
On Fri, Jun 6, 2014 at 2:44 PM, Manuel Fiorelli <fiorelli@info.uniroma2.it>
wrote:

> Hi John,
>
> see my answers below.
>
> 2014-06-06 13:49 GMT+02:00 John P. McCrae <jmccrae@cit-ec.uni-bielefeld.de
> >:
>
> Hi Manuel, Armando, all,
>>
>> Some comments on the lime.owl file
>>
>>    - Should Lexicon, Lexicalization and co. really be subclasses of
>>    void:Dataset. A void:Dataset is defined as a "set of RDF triples that are
>>    published, maintained or aggregated by a single provider". Thus, it seems
>>    that many lexica and lexicalizations can be in the same dataset and
>>    conversely it is very hard to define which triples are in a lexicon (for
>>    example interlingual links are shared between two lexica). It would make
>>    more sense to me to have lexica, lexicalizaitons, etc., as part of a
>>    dataset, but not as datasets themselves.
>>
>> We have in principle three distinct datasets: the dataset being
> lexicalized, the lexicon providing the vocabulary and the lexicalization
> which relates them.
>
> These distinctions do not entail that the dataset are really disjoint. In
> fact, the VoID vocabulary introduces the subset relation, which relate a
> dataset to its parts. For instance, we could have a dataset, which has
> different subsets, corresponding to different linksets with other datasets.
> In this scenario, the datasets and its linksets may share the same SPARQL
> endpoint. However, knowing in advance that there exist some subsets that
> provide interlinks may be useful: it is exactly the reason we use the LOD
> cloud diagram.
>
> The key idea behind the concept of void:Dataset is to provide metadata
> that provide useful information about the actual data they refer to. In a
> sense, a void:Dataset should provide information that help to understand
> the usefulness of the data, to interpret the data, and so on.
>
OK, so my question is then which triples belong to which section, if I have
something typical like

:know a ontolex:LexicalEntry ;
  ontolex:sense :know#Sense ;
  ontolex:canonicalForm :know#Form .

:know#Form ontolex:writtenRep "know"@eng

:know#Sense ontolex:reference foaf:knows

What is the lexicon and what is the lexicalization?

Furthermore, if I add something from the synsem module, e.g.,

:know synsem:synBehavior :know#Frame .

:know#Frame synsem:synArg :know#arg1 , :know#arg 2.

:know#Sense synsem:subjOfProp :know#arg1 ;
  synsem:objOfProp :know#arg2 .

Where does this belong?

Furthermore, if I publish my data (ontology and lexicon) as a single file,
then it makes it difficult for an end user to figure out which bit is
which. VoID is much simpler and says that my dataset is described by either
a SPARQL endpoint, a data dump, a root resource or a URI lookup; this seems
hard to implement for tightly integrated ontology-lexica.


>>    - What is a "conceptualized linguistic resource"? This is not really
>>    clear to me.
>>
>> Not sure about the name, but the idea was to refer to any resource like
> WordNet: that is a resource providing lexical concepts grouping
> semantically close senses of different words.
>
I agree the term is not great. Perhaps we don't need to say this, I'm not
sure what the value in having this class is.

>
>>    - How does a "lexical linkset" differ from a "linkset"? (i.e, do we
>>    need this class?)
>>
>> It is a specialization, that seemed useful to us, to highlight the
> "special nature" of the dataset for which we are providing links.
>
OK, the seems kind of unnecessary, perhaps we should consider removing this
too.

>
>>    - What is the range of lime:class? How does it differ from void:class?
>>
>> The range is rdfs:Class. The different lies in the domain. Indeed, the
> domain of void:class is dataset, while lime:class has domain
> ResourceCoverage.
>
Hmm... I wonder if the intention is right here... void:class allows you to
select all individuals of a given type, but most true ontologies don't have
any individuals, e.g., the DBpedia ontology has none, but the dataset has
many, I wonder if this is useful here. Furthermore, most lexicons don't
cover many true individuals, e.g., person names, company names, etc. as
doing so is kind of dull and simplistic.

>
>>    - Shouldn't there be an object property linking a lexicalization to
>>    an ontology?
>>
>> It is lexicalizedDataset. In our parlance, we refer to dataset to embrace
> both factual knowledge and domain descriptions.
>
Why not just call the property *ontology* then? This is the onto-lex group,
a lexicalization is between an ontology and a lexicon.

>
>>    - 'language' is already in the core OntoLex model, do we need it in
>>    lime?
>>
>> We wrote that the unification of this property with the corresponding one
> in OntoLex will be a point of discussion.
>
>>
>>    - How do you count lexicalizations? i.e., is it the number of
>>    Lexicalization instances or the number of lexicalized reference/entry pairs.
>>
>> There is a slight ambiguity with regard to this. A Lexicalization is
> really a collection of reference/entry pairs, which are individually
> referred to as lexicalizations (uncapitalized initial).
>
> If this ambiguity is unacceptable, we could consider alternative names for
> the Lexicalization class. Perhaps, LexicalMapping or LexicoSemanticMapping,
> or whatever sensible name.
>
A reference/entry pair in the OntoLex model is called a Lexical Sense! So
the lexicalizations and the senses property must count the same thing,
right?

>
>

>>    - What are the domains of the properties lexicalEntries, senses,
>>    references, etc.?
>>
>> In the owl file you should have the following information:
>
>    - lexicalEntries -> Lexicalization or ResourceCoverage or Lexicon
>    - senses -> Lexicalization or ResourceCoverage or Lexicon
>    - lexicalizations -> Lexicalization or ResourceCoverage
>    - references -> Lexicalization or ResourceCoverage
>
> So.. follow up question: If I can put the lexical entry count on the
lexicalization object, what is the point of the resource coverage object?

>
>>    - Shouldn't we also count LexicalConcepts and Forms?
>>
>> As I wrote in the previous email, we are open to suggestions about
> additional statistics.
>
OK consider it suggested

Regards,
John

>
>
> --
> Manuel Fiorelli
> PhD student in Computer and Automation Engineering
> Dept. of Civil Engineering and Computer Science
> University of Rome "Tor Vergata"
> Via del Politecnico 1
> 00133 Roma, Italy
>
> tel: +39-06-7259-7334
> skype: fiorelli.m
>
Received on Friday, 6 June 2014 18:16:50 UTC