Re: LIME proposal for the OntoLex W3C Community Group from Philipp Cimiano on 2014-03-13 (public-ontolex@w3.org from March 2014)

From: Philipp Cimiano <cimiano@cit-ec.uni-bielefeld.de>
Date: Thu, 13 Mar 2014 11:08:33 +0100
To: "John P. McCrae" <jmccrae@cit-ec.uni-bielefeld.de>, Armando Stellato <stellato@info.uniroma2.it>
CC: Manuel Fiorelli <fiorelli@info.uniroma2.it>, "public-ontolex@w3.org" <public-ontolex@w3.org>
Message-ID: <532183A1.905@cit-ec.uni-bielefeld.de>
Hi John,

  I agree here as well, the metatada you mention below seem very useful 
to be included in lime, or at least mention somehwere that it would be 
good practice to reuse the VoID metadata when creating a ontolex lexicon.

An example would suffice.

So the question is: what additional properties do we need beyond those 
that John mentions below.
We should discuss this on Friday.

Philipp.

Am 08.03.14 17:01, schrieb John P. McCrae:
> Hi,
>
> So this troubles me:
>
>     The values for most of these values could be calculated using
>     SPARQL construct statements it seems in the sense that some
>     information is aggregated and added as explicit value of some lime
>     property. Fair enough, it saves people the effort to run the
>     SPARQL queries over the dataset themselves, making this
>     information readily accessible.
>
>
> The point of metadata is not to optimize commonly run SPARQL queries, 
> for two primary reasons, firstly it bulks up the model and instances 
> of the model with triples for these 'pre-compiled' queries and 
> secondly it is very hard to predict what queries an end-user will want 
> to run. It seems that the kind of metadata we are proposing to model 
> is nearly entirely pre-compiled queries, and are of questionable 
> practical application. That is, I ask a simple question: /if we can 
> achieve resource interoperability for OntoLex already with SPARQL why 
> the heck do we need metadata anyway??/
>
> As such, I have been reading a few papers and trying to figure out 
> what metadata we can reasonably expect most users of the model to use. 
> In particular, following this paper 
> <http://www.niso.org/publications/press/UnderstandingMetadata.pdf>, I 
> propose a list of metadata 'desiderata' and divide them into three 
> categories:
>
>   * *Descriptive Metadata*
>       o Name*
>       o Author(s)*
>       o Size (Entry count, Sense count, Reference count etc.)
>   * *Structural Metadata*
>       o Which Linguistic Description Ontology is used
>       o Module coverage. e.g., Does this resource have syn-sem modelling
>   * *Administrative Metadata*
>       o When published*
>       o Where published*
>       o Formats*
>       o SPARQL endpoint*
>       o Previous version*
>
> * The vocabulary already exists, e.g., Dublin Core, VoID
>
> This list above are things that are vital for understanding a resource 
> and which in most cases cannot be inferred from examining the dataset 
> alone, and the rest especially entry counts are useful for resource 
> repositories such as DataHub.
>
> Regards,
> John
>
>
> On Sat, Mar 8, 2014 at 5:06 AM, Armando Stellato 
> <stellato@info.uniroma2.it <mailto:stellato@info.uniroma2.it>> wrote:
>
>     Dear Philipp, John, all,
>
>     Yes, absolutely agree that we should provide the SPARQL
>     constructs, as an operational way to express their semantics. I
>     think we already have them, as Manuel has written them in a LIME
>     exporter component (we should just readjust the output according
>     to the structure we want to build..see previous emails).
>     Regarding the fact that the metadata should be obtainable from the
>     data present in the content, in general I agree, and the general
>     idea is that those sort of linksets could be subset of different
>     datasets and thus appear in the most appropriate void file (from
>     the case of three distinct datasets up to the case in which
>     ontology, lexicon and lexicalization collapse into one dataset,
>     yet with the lexicalization being identifiable at the metadata
>     level as a specific subset of the whole).
>     Yet, to strictly respect your rule, a problematic case is:
>     providing that it is important to express the lexical coverage for
>     resource R (for Rs belonging to an ontology) and that this
>     requires to be able to enumerate the instances of R, this amounts
>     to say that, in the case of a separated lexicon and ontology, the
>     metadata should be in the void of the lexicon, and thus, to
>     iterate Rs, the lexicon should necessarily owl:import the
>     ontology, or the data from onto would not be available and this
>     would break your rule for which the metadata should be always
>     calculable in terms of available data.
>
>     I think our view here is pretty simple:
>     we consider the triad: <ontology, lexicon, lexicalization>. In
>     this triad, ontology and lexicon may happen to be independent, but
>     the lexicalization will always have (at least conceptually) a
>     known dependency to the content of the other two (I.e. whoever
>     wrote the lexicalization, did it with the target ontology and
>     chosen lexicon at hand). Thus I think it should be in any case
>     legal that the metadata about the lexicalization is able to tell
>     which percentage of (all) the Rs from the ontology is covered by
>     the lexicalization.
>
>     Would that work?
>
>     Cheers,
>
>     Armando
>
>
>
>
>     ------------------------------------------------------------------------
>     Da: Philipp Cimiano <mailto:cimiano@cit-ec.uni-bielefeld.de>
>     Inviato: ‎07/‎03/‎2014 22.43
>     A: Armando Stellato <mailto:stellato@info.uniroma2.it>; 'John
>     McCrae' <mailto:jmccrae@cit-ec.uni-bielefeld.de>
>     Cc: 'Manuel Fiorelli' <mailto:fiorelli@info.uniroma2.it>;
>     public-ontolex@w3.org <mailto:public-ontolex@w3.org>
>     Oggetto: Re: LIME proposal for the OntoLex W3C Community Group
>
>     Dear Armando, all,
>
>      second email on the metadata, referring in particular to the
>     aggregating properties.
>
>     Many of the properties that we are proposing in the metadata
>     module are aggregating properties: number of lexical entries,
>     average number of lexical entries etc.
>
>     We sort of agreed that these are computed locally for the dataset
>     in question without consulting external lexica etc. right?
>
>
>
>     However, in order to properly document the semantics of the lime
>     poperties we introduce, would it not be feasible to indicate a
>     SPARQL construct query that computes the property value? In that
>     sense we would clearly define the semantics of these metadata
>     properties.
>
>     What do you think?
>
>     Philipp.
>
>
>     Am 06.03.14 20:17, schrieb Armando Stellato:
>>
>>     Dear Philipp and John,
>>
>>     no need to say sorry, you are coordinating a whole community
>>     group, we cannot say the same on our side, yet we are no quicker
>>     than you in replying :D
>>
>>     You raise an important point, the solution of which actually
>>     raises up an interesting opportunity for other important aspects
>>     of at the level of the web architecture of Ontolex.
>>
>>     Before we delve further into the details, let us ask one more
>>     question:
>>
>>     What is the relationship between ontologies and the lexica, is it
>>     1:n (an ontology may have multiple lexica) or m:n (as before,
>>     plus the same lexicon may be connected to multiple ontologies) ?.
>>     A strictly related question is: “is a lexicon built specifically
>>     for an ontology?”.
>>
>>     Having ported WordNet in Ontolex should already give the answer
>>     to that (WordNet exists a-priori from any ontology, and thus it
>>     should be one example in favor of the m:n hypothesis, though we
>>     may think of a Lexicon as something importing WordNet and
>>     extending it for being a lexicon for a given ontology).
>>
>>     In case the m:n hypothesis is confirmed, we should think about
>>     some form of binding, as a third object implementing the
>>     connection between an independent lexicon and an ontology.
>>
>>     I think I already asked something related to that when I had some
>>     doubts about how to deal with compound terms: if a lexicon exists
>>     independently, it will probably not contain some compounds needed
>>     to describe resources of the ontology, so we cannot assume these
>>     should be always available (at the time of my question, I
>>     remember I was told: “for things like “red car”, you should
>>     foresee a dedicated entry in the lexicon, though it can then be
>>     decomposed through the decomposition module”, thus implying that
>>     the lexicon has to exist FOR a given ontology.
>>
>>     Probably I’m missing something here, but I think these are
>>     fundamental aspects which should be made clear in the wiki pages
>>     about the overall architecture and the usage of the model.
>>
>>     Ok, sorry for the long introduction, but how you will see, it is
>>     related to our topic…we however maybe managed to handle this
>>     independently of the above. So, back to the topic…
>>
>>     …Our model relates to a void file, but this file could be, for
>>     instance, not the void file of an ontology, but the void file of
>>     (something similar to) a void:linkset which binds a lexicon to an
>>     ontology. To cover also the need you express at the end of the
>>     email, we could propose the following changes:
>>
>>     <!--[if !supportLists]-->1)<!--[endif]-->A lime:lexicon property,
>>
>>     <!--[if !supportLists]-->a.<!--[endif]-->domain:
>>     lime:LanguageCoverage (the class obviously)
>>
>>     <!--[if !supportLists]-->b.<!--[endif]-->range:     void:Dataset
>>     (or an appropriate subclass lime:Lexicon to define a dataset
>>     which contains linguistic expressions for some dataset. Note that
>>     a void:Dataset containing both conceptual and lexical info would
>>     be the lexicon of itself!
>>
>>     <!--[if !supportLists]-->2)<!--[endif]-->lime:lexicalModel (old
>>     linguisticModel, moved to having domain set to languageConverage)
>>
>>     so we could have a structure like that:
>>
>>     void:Dataset  --lime:languageCoverage--> lime:LanguageCoverage
>>     --lime:lexicon--> void:Dataset
>>
>>     --lime:lexicalModel--> (rdfs:, skos:, skosxl:, ontolex: )
>>
>>     --lime:resourceCoverage--> (usual stat info)
>>
>>     But then, we would have another issue…what is a lexicon? If a
>>     lexicon is something independent of the “enrichment” of an
>>     ontology with respect to a language, and lives on its own, then,
>>     here, in our case, we are more interested in knowing the third
>>     element we were mentioning above, that is, the “man in the
>>     middle” providing links between a conceptual resource and a
>>     lexicon. Thus, with just a terminological change, (lexicon -->
>>     lexicalization), and relying on the fact that this representation
>>     delegates to the lexicalization the pointers to the lexicon:
>>
>>     void:Dataset --lime:languageCoverage--> lime:Lan
>>
>
>     [il messaggio originale non è incluso]
>
>


-- 

Prof. Dr. Philipp Cimiano

Phone: +49 521 106 12249
Fax: +49 521 106 12412
Mail: cimiano@cit-ec.uni-bielefeld.de

Forschungsbau Intelligente Systeme (FBIIS)
Raum 2.307
Universität Bielefeld
Inspiration 1
33619 Bielefeld
Received on Thursday, 13 March 2014 10:09:01 UTC