Re: [ANN] LOD from Italian National Research Council from Hugh Glaser on 2011-02-09 (public-lod@w3.org from February 2011)

From: Hugh Glaser <hg@ecs.soton.ac.uk>
Date: Wed, 9 Feb 2011 13:33:45 +0000
To: Aldo Gangemi <aldo.gangemi@cnr.it>
CC: Linked Data community <public-lod@w3.org>, Enrico Daga <enrico.daga@cnr.it>, Alberto Salvati <alberto.salvati@cnr.it>
Message-ID: <EMEW3|afc0627626608157cc486925dc0c640an18DXl02hg|ecs.soton.ac.uk|BD6C940D-F798->
Hi Aldo,
Nice stuff.
Regarding vocabulary alignment.
I would suggest you might want to keep it out of your dataset.
Vocabulary alignment is a matter of opinion; of course your dataset is opinion as well, but it is the opinion of the organisation, whereas the vocabulary alignment you talk about might be somebody else's opinion.
Where the dataset owner agrees that, for example, dct:creator aligns with pubblicazioni:autore, then perhaps you can.
Of course, there is a question about why dct:creator was not used in the first place, but it can be neat to simply use all your own properties, so that's OK.
But if the alignment is to go in the dataset, it should be part of the knowledge capture process, not added by a third party.

In fact, the example you choose is great.
It is not at all clear to me that 
cnr:coauthor rdfs:subPropertyOf foaf:knows
is actually what an organisation would want to say.
Even with the loosest meaning of foaf:knows, there will be co-authors who do not foaf:knows each other (certainly in some fields).
And some people would be upset that their organisation was publishing data stating that they did.
(I just checked the latest edition of Nature, and the two articles each have upwards of 50 authors from all over the world; I'm sure many of them have never communicated with each other, apart from this article.)
One of the advantages of using your own ontology is that you are never saying anything other than what you meant (whatever that might be :-) )

So if a consumer of the data wanted to assert
cnr:coauthor rdfs:subPropertyOf foaf:knows
that is up to them and would be fine, but to enforce it seems not good to me.

And to help them you might provide a separate document with the alignments in them, so that they can pick them up if they want.
And our policy is to do exactly the same with the identity management thing as well, which is actually a similar problem (and I would be happy to discuss how to do that with you, but I think we would need to go off-list for that, as we have had many discussions on the list about it ;-) )

I know I haven't tackled the technical issues much, which is what you are asking, but I always start at the socio :-)

Best
Hugh

On 9 Feb 2011, at 09:58, Aldo Gangemi wrote:

> Dear all, we are happy to announce the release of the beta version of data.cnr.it and the Semantic Scout exploratory browser.
> 
> data.cnr.it [1] is the linked open data version of the scientific data from the Italian National Research Council, and it includes researchers, institutes, research programmes, publications, topics, etc.
> A Virtuoso-powered SPARQL endpoint is available at [4]; a top-down browser is available at [5]; a voiD description is at [6].
> 
> The Semantic Scout [2] is an experimental exploratory browser applied to the data.cnr.it datasets, cf. a paper published at EKAW2010 [3] for details.
> 
> data.cnr.it and the Semantic Scout have been designed by the Semantic Technology Lab ([7], see [8] for credits) that comprises semantic web researchers and engineers from ISTC-CNR (the Institute of Cognitive Sciences and Technologies of the Italian National Research Council), and from the Information Systems Unit of the Italian National Research Council.
> 
> We have used linked data principles, and the datasets are based on modular, pattern-based designed OWL ontologies [9]. Data have been triplified from multiple CNR databases, and enriched by means of OWL reasoning (ABox materialization and classification), as well as by NLP and graph mining techniques, e.g. the topics for the researchers have been learnt by an automatic categorization system that uses researchers' textual signatures (textual records) against the textual signature (pages) of DBpedia categories.
> 
> Current work is on integrating a more robust identity management and its possible integration with Okkam, a deeper voiD description of the datasets, entity linking to other LOD datasets (e.g. DBLP), more vocabulary alignment (currently limited to FOAF, SKOS, and DC), etc.
> 
> Regarding the last point, we are discussing the problem if vocabulary alignment should be reflected or not in the datasets by means of materialization. This problem has pervasive consequences on the size of the services vs. datasets that enable linked data consumption: any help from the community about pros and cons of either approaches? For example, if we declare (schema level):
> 
> cnr:coauthor rdfs:subPropertyOf foaf:knows
> cnr:Researcher rdfs:subClassOf foaf:Person
> 
> and we have e.g. in the data (*simplified names*):
> 
> cnrdata:AldoGangemi cnr:coauthor cnrdata:EnricoDaga
> cnrdata:AldoGangemi rdf:type cnr:Researcher
> 
> should we materialize an additional dataset containing e.g.:
> 
> cnrdata:AldoGangemi foaf:knows cnrdata:EnricoDaga
> cnrdata:AldoGangemi rdf:type foaf:Person
> 
> or should that be provided by a SPARQL endpoint under some entailment regime?
> 
> Consider that this is not only a matter of SPARQL efficiency vs. amount of data, but also of data entanglement: e.g. when materialized, the topology of linked datasets would be severely complicated by the mutityping of individuals.
> 
> Thanks for any advise (there not seems to be any best practice yet)
> Ciao
> Aldo, Enrico, Alberto
> 
> [1] http://data.cnr.it
> [2] http://bit/ly/semanticscout
> [3] http://data.cnr.it/site/resources
> [4] http://data.cnr.it/sparql/
> [5] http://data.cnr.it/data/cnr/individuo/CNR
> [6] http://data.cnr.it/data/http://data.cnr.it/dataset/
> [7] http://stlab.istc.cnr.it
> [8] http://data.cnr.it/site/contacts
> [9] http://data.cnr.it/site/ontology
> 
> 
> _____________________________________
> 
> Aldo Gangemi
> Senior Researcher
> Semantic Technology Lab (STLab)
> Institute for Cognitive Science and Technology,
> National Research Council (ISTC-CNR) 
> Via Nomentana 56, 00161, Roma, Italy 
> Tel: +390644161535
> Fax: +390644161513
> aldo.gangemi@cnr.it
> http://www.stlab.istc.cnr.it
> http://www.istc.cnr.it/createhtml.php?nbr=71
> skype aldogangemi
> okkam ID: http://www.okkam.org/entity/ok200707031186131660596
> 

-- 
Hugh Glaser,  
              Intelligence, Agents, Multimedia
              School of Electronics and Computer Science,
              University of Southampton,
              Southampton SO17 1BJ
Work: +44 23 8059 3670, Fax: +44 23 8059 3045
Mobile: +44 78 9422 3822, Home: +44 23 8061 5652
http://www.ecs.soton.ac.uk/~hg/
Received on Wednesday, 9 February 2011 13:34:27 UTC