- From: Aldo Gangemi <aldo.gangemi@cnr.it>
- Date: Wed, 9 Feb 2011 10:58:02 +0100
- To: Linked Data community <public-lod@w3.org>
- Cc: Aldo Gangemi <aldo.gangemi@cnr.it>, Enrico Daga <enrico.daga@cnr.it>, Alberto Salvati <alberto.salvati@CNR.IT>
- Message-Id: <AB0376DA-E71F-4914-A358-1709BED184CD@cnr.it>
Dear all, we are happy to announce the release of the beta version of data.cnr.it and the Semantic Scout exploratory browser. data.cnr.it [1] is the linked open data version of the scientific data from the Italian National Research Council, and it includes researchers, institutes, research programmes, publications, topics, etc. A Virtuoso-powered SPARQL endpoint is available at [4]; a top-down browser is available at [5]; a voiD description is at [6]. The Semantic Scout [2] is an experimental exploratory browser applied to the data.cnr.it datasets, cf. a paper published at EKAW2010 [3] for details. data.cnr.it and the Semantic Scout have been designed by the Semantic Technology Lab ([7], see [8] for credits) that comprises semantic web researchers and engineers from ISTC-CNR (the Institute of Cognitive Sciences and Technologies of the Italian National Research Council), and from the Information Systems Unit of the Italian National Research Council. We have used linked data principles, and the datasets are based on modular, pattern-based designed OWL ontologies [9]. Data have been triplified from multiple CNR databases, and enriched by means of OWL reasoning (ABox materialization and classification), as well as by NLP and graph mining techniques, e.g. the topics for the researchers have been learnt by an automatic categorization system that uses researchers' textual signatures (textual records) against the textual signature (pages) of DBpedia categories. Current work is on integrating a more robust identity management and its possible integration with Okkam, a deeper voiD description of the datasets, entity linking to other LOD datasets (e.g. DBLP), more vocabulary alignment (currently limited to FOAF, SKOS, and DC), etc. Regarding the last point, we are discussing the problem if vocabulary alignment should be reflected or not in the datasets by means of materialization. This problem has pervasive consequences on the size of the services vs. datasets that enable linked data consumption: any help from the community about pros and cons of either approaches? For example, if we declare (schema level): cnr:coauthor rdfs:subPropertyOf foaf:knows cnr:Researcher rdfs:subClassOf foaf:Person and we have e.g. in the data (*simplified names*): cnrdata:AldoGangemi cnr:coauthor cnrdata:EnricoDaga cnrdata:AldoGangemi rdf:type cnr:Researcher should we materialize an additional dataset containing e.g.: cnrdata:AldoGangemi foaf:knows cnrdata:EnricoDaga cnrdata:AldoGangemi rdf:type foaf:Person or should that be provided by a SPARQL endpoint under some entailment regime? Consider that this is not only a matter of SPARQL efficiency vs. amount of data, but also of data entanglement: e.g. when materialized, the topology of linked datasets would be severely complicated by the mutityping of individuals. Thanks for any advise (there not seems to be any best practice yet) Ciao Aldo, Enrico, Alberto [1] http://data.cnr.it [2] http://bit/ly/semanticscout [3] http://data.cnr.it/site/resources [4] http://data.cnr.it/sparql/ [5] http://data.cnr.it/data/cnr/individuo/CNR [6] http://data.cnr.it/data/http://data.cnr.it/dataset/ [7] http://stlab.istc.cnr.it [8] http://data.cnr.it/site/contacts [9] http://data.cnr.it/site/ontology _____________________________________ Aldo Gangemi Senior Researcher Semantic Technology Lab (STLab) Institute for Cognitive Science and Technology, National Research Council (ISTC-CNR) Via Nomentana 56, 00161, Roma, Italy Tel: +390644161535 Fax: +390644161513 aldo.gangemi@cnr.it http://www.stlab.istc.cnr.it http://www.istc.cnr.it/createhtml.php?nbr=71 skype aldogangemi okkam ID: http://www.okkam.org/entity/ok200707031186131660596
Received on Wednesday, 9 February 2011 09:59:08 UTC