Re: LIME proposal for the OntoLex W3C Community Group from Philipp Cimiano on 2014-03-13 (public-ontolex@w3.org from March 2014)

From: Philipp Cimiano <cimiano@cit-ec.uni-bielefeld.de>
Date: Thu, 13 Mar 2014 11:17:42 +0100
To: Armando Stellato <stellato@info.uniroma2.it>, "'John P. McCrae'" <jmccrae@cit-ec.uni-bielefeld.de>
CC: 'Manuel Fiorelli' <fiorelli@info.uniroma2.it>, public-ontolex@w3.org
Message-ID: <532185C6.1080801@cit-ec.uni-bielefeld.de>
Dear all,

  ok, so we clarified that per se it is fine to include materialized 
results of pre-defined SPARQL queries as new vocabulary elements.

So we are a step further guys ;-)

Whether or not we want to include properties related to linguistic 
resource coverage is then the real point of discussion I think. So let's 
focus on this point.

Other than that: maybe it is not so important whether the values can be 
computed using SPARQL or we need some procedural component to compute 
them (as in the lime Java API mentioned by Armando).

My point was rather: let's define what we mean exactly with these 
properties by giving them an exact semantics. It is fine if this 
semantics is made explicit. But the point is: if it is not the case that 
all creators of lexica use the properties in the same way, then they 
become sort of useless, see our recent discussion of the "confidence" 
property to indicate confidence in a translation: it is quite useless if 
people adopt a completely different interpretation of this value.

So rather than really having SPARQL Construct Statements for most 
metadata properties, let's give precise semantics so that anyone could 
compute the values of the properties consistently with this semantics.

Does this make sense?

Talk to you all tomorrow.

Philipp.

Am 08.03.14 20:44, schrieb Armando Stellato:
>
> Dear John,
>
> well I’m a bit puzzled, in that this is surely worth discussing, but 
> it’s a completely orthogonal topic again. The fact that Philipp 
> mentioned the possibility to define their semantics through SPARQL 
> does not change anything about the nature of these properties so, if 
> you found them useless because of their redundancy with the data, they 
> were useless/redundant even before.
>
> Maybe we should synthetize a few aspects and discuss them in a page of 
> the wiki. What do you think? The impression is that in the emails we 
> are opening new topics instead of closing the open ones, so it may be 
> worth to have separate threads. Please let us know, if you feel we are 
> almost close to the end, we may even go along with emails (maybe with 
> specific threads).
>
> Btw, to reply to your specific question:
>
> The point of metadata is not to optimize commonly run SPARQL queries, 
> for two primary reasons, firstly it bulks up the model and instances 
> of the model with triples for these 'pre-compiled' queries and 
> secondly it is very hard to predict what queries an end-user will want 
> to run. It seems that the kind of metadata we are proposing to model 
> is nearly entirely pre-compiled queries, and are of questionable 
> practical application. That is, I ask a simple question: /if we can 
> achieve resource interoperability for OntoLex already with SPARQL why 
> the heck do we need metadata anyway??/
>
> Personally, as an engineer, I’m biased towards considering “redundancy 
> the evil”, and keep information to its minimum (so I would tend to 
> agree with your point). But, engineering 101 manual tells that you may 
> sometimes give up the orthodoxy on the above principle, if this 
> greatly improves performance, scalability etc…
>
> Furthermore, instead of trivially giving up, you should designate how, 
> when and where the redundancy points are defined (whatever system you 
> are speaking about).
>
> Now, narrowing down to our case, we have a clear point, the void file, 
> that is a surrogate of a dataset, contains its metadata, and is always 
> updated following updates to its content: no danger of dangling 
> out-of-date redundant information then.
>
> We have also a clear scenario: packs of spiders roaming around the web 
> and getting plenty of useful information from tons of different 
> datasets without stressing their SPARQL endpoints; mediators examining 
> metadata from multiple resources and taking decisions very quickly etc…
>
> But, I’m a just poor guy :) so, out of my personal view, let me 
> mention some notable predecessors:
>
> Already mentioned by Manuel in his email of today, we have VOAF: 
> http://lov.okfn.org/vocab/voaf/v2.3/index.html
>
> ..but VOAF is not a standard…
>
> …talking about standards, ladies and gentlemen, here is VoID itself 
> and its many SPARQL deducible properties!
>
> https://code.google.com/p/void-impl/wiki/SPARQLQueriesForStatistics
>
> ..and to happily close my defense, well, in any case Manuel just 
> confirmed in his email that I should have thought one second more 
> about the SPARQL deducibility of LIME’s properties :-)
>
> Some of them are in fact SPARQL deducible, but it seems the one we 
> took as an example (lime:languageCoverage 
> <http://art.uniroma2.it/ontologies/lime#languageCoverage>) is exactly 
> one of those not so trivial to write (maybe I’m not an expert with 
> CONSTRUCTS, but I would say not possible at all).
>
> In the LIME module, we used RDF API and plain Java post processing to 
> compute them, so I was not recalling which ones were simple SPARQL 
> constructs and which ones needed more processing.
>
> Cheers,
>
> Armando
>


-- 

Prof. Dr. Philipp Cimiano

Phone: +49 521 106 12249
Fax: +49 521 106 12412
Mail: cimiano@cit-ec.uni-bielefeld.de

Forschungsbau Intelligente Systeme (FBIIS)
Raum 2.307
Universität Bielefeld
Inspiration 1
33619 Bielefeld
Received on Thursday, 13 March 2014 10:18:09 UTC