Re: [ANN] LOD from Italian National Research Council from Enrico Daga on 2011-02-14 (public-lod@w3.org from February 2011)

From: Enrico Daga <enrico.daga@cnr.it>
Date: Mon, 14 Feb 2011 19:04:38 +0100
To: Kingsley Idehen <kidehen@openlinksw.com>
Cc: bvilla@delicias.dia.fi.upm.es, Aldo Gangemi <aldo.gangemi@cnr.it>, Hugh Glaser <hg@ecs.soton.ac.uk>, Linked Data community <public-lod@w3.org>, Alberto Salvati <alberto.salvati@cnr.it>
Message-ID: <AANLkTiknvnxBWdRgSOhBJrd5PBkLdif3rLD_kLVF7R3g@mail.gmail.com>
Hi Boris, Kingsley, all,

On 9 February 2011 23:31, Kingsley Idehen <kidehen@openlinksw.com> wrote:
> On 2/9/11 5:01 PM, Boris Villazón Terrazas wrote:
>>
>> Hi Aldo et al.
>>
>> Nice stuff! ;-)
>>
>> Regarding your question, I can tell you what we did within the context of
>> GeoLinkedData [1]
>> We have separated the model/vocabulary from the data. So we have the
>> model/vocabulary in a Named Graph and the data in other Named Graph.
Yes, I agree this is a good way to do it, we have also three graphs
(model, data, model+data) and this I guess can be considered a good
practice

>> According to Kingsley it seems to be we are going to the right direction
>> .... Thanks Kingsley
>>
>> We have similar cases as you have, for example
>> geoes:Provincia rdfs:subClassOf fao:territory [2]
>> and a particular resource of type geoes:Provincia can be geores:Madrid [3]
>>
>> We only materialize the instances of geoes:Provincia, we do not for
>> fao:territory.
>> If I understand correctly to what Kingsley says, we can conditionally
>> apply inference rules via SPARQL, so our alignment triples become
>> conditional and matrialize for query evaluation purposes only, using
>> Virtuoso, right? This is something that we have to check.
>
> Yes!
Yes, thank you Kingsley for pointing out that. We also will try to
check this for having alignment effective for client applications
without having to maintain duplicated data. Again, alignments should
be declared in a separate graph, (not in the one of the schema) do you
agree?

>
> Kingsley
>>
>> Best
>>
>> Boris
>>
>> P.S. BTW you can generate your sitemap files from your sparql endpoint
>> using sitemap4rdf [4] and then submit them to Google and Sindice ;-)
Thank you Boris! We will check this soon.

Bests
Enrico

>>
>>
>> [1] http://geo.linkeddata.es/
>> [2]  http://geo.linkeddata.es/ontology/Provincia
>> [3] http://geo.linkeddata.es/resource/Provincia/Madrid
>> [4] http://lab.linkeddata.deri.ie/2010/sitemap4rdf/
>>
>> On 09/02/2011 19:17, Kingsley Idehen wrote:
>>>
>>> On 2/9/11 12:57 PM, Enrico Daga wrote:
>>>>
>>>> Dear Hugh, Kingsley, all,
>>>>
>>>> thank you both for your hints.
>>>>
>>>>>> Where the dataset owner agrees that, for example, dct:creator aligns
>>>>>> with
>>>>>> pubblicazioni:autore, then perhaps you can.
>>>>>> Of course, there is a question about why dct:creator was not used in
>>>>>> the
>>>>>> first place, but it can be neat to simply use all your own properties,
>>>>>> so
>>>>>> that's OK.
>>>>>> But if the alignment is to go in the dataset, it should be part of the
>>>>>> knowledge capture process, not added by a third party.
>>>>
>>>> In our case we are the maintainer of the dataset and ontology so we
>>>> know that a triple like
>>>>
>>>> pubblicazioni:autore rdfs:subClassOf dct:creator
>>>>
>>>> is correct.
>>>
>>> Yes, but "correct" is one of those subjective things when working at
>>> InterWeb scale. Thus, its important that you partition your data using Named
>>> Graphs rather than work with a single graph.
>>>
>>> The approach above allows you to see things as you seek, while letting
>>> others do the same via their specific "context lenses".
>>>
>>>> Our point is mainly related to make the dataset easily reusable by the
>>>> means of shared vocabularies even if those commonly known names have
>>>> not been used in the process of dataset generation.
>>>
>>> Yes, but if you have the vocabulary triples (TBox) in a separate name
>>> graph you're fine. Or you can leave everything in your main graph, but place
>>> any inter vocabulary mapping triples in a separate Named Graph.
>>>
>>>> We want our data to be self-explained providing suggested alignments
>>>> between our internal vocabulary and public ones, at least for very
>>>> common cases, such "abstract:Titolo" and "dc:title", for example.
>>>
>>> Yes, this is all clear. The key is to partition your data, there's no
>>> downside bar deflection of barbs from those who see things differently due
>>> to their specific "context lenses" when dealing with your data.
>>>
>>>>>> So if a consumer of the data wanted to assert
>>>>>> cnr:coauthor rdfs:subPropertyOf foaf:knows
>>>>>> that is up to them and would be fine, but to enforce it seems not good
>>>>>> to
>>>>>> me.
>>>>
>>>> Yes, in this case the alignment implies additional assumptions, but in
>>>> principle we need this (maybe not exactly that...) to describe the
>>>> dataset to non-cnr people.
>>>
>>> Again, that's fine, put the triples in a separate Named Graph. It won't
>>> adversely affect anything.
>>>
>>>>> In a nutshell, put the controversial stuff in its own Named Graph
>>>>> within
>>>>> your Virtuoso instance. When making Linked Data Resources (e.g. HTML
>>>>> browser
>>>>> pages) you can scope your SPARQL DESCRIBES or CONSTRUCTs to the main
>>>>> Graph
>>>>> (the one without an alignment triples etc..). The SPARQL endpoint stays
>>>>> as
>>>>> the open ended access point to all data.
>>>>
>>>> So you suggest to use a separate graph, not involved in content
>>>> negotiation but accessible through the sparql endpoint.
>>>
>>> I mean:
>>>
>>> 1. Your HTML pages (which use content negotiation and SPARQL DESCRIBE or
>>> CONSTRUCTS) to make Description Page can be scoped to the entire quad store
>>> or specific Named Graphs
>>>
>>> 2. SPARQL endpoint is always open for people to query the entire
>>> collection of graphs or specific Named Graph combos.
>>>
>>> You have to decide how you want to project your world view to the public.
>>> Bottom line, the public always has a SPARQL endpoint to they can apply their
>>> specific "context lenses" assuming you choose to have you world view
>>> (including cross vocabulary mappings) exposed in your Linked Data pages..
>>>
>>>> This solution could be good, but brings more/new questions.
>>>> Let's say we create a new dataset<http://data.cnr.it/alignments>,
>>>> what should it return?
>>>> 1) alignments at the schema level between the CNR ontology and public
>>>> well-known vocabularies, triples like "pubblicazioni:autore
>>>> rdfs:subClassOf dct:creator"
>>>> 2) the above plus materialized triples, for example:
>>>>
>>>>>>> cnrdata:AldoGangemi foaf:knows cnrdata:EnricoDaga
>>>>>>> cnrdata:AldoGangemi rdf:type foaf:Person
>>>>
>>>> The first would leave the interpretation of the alignment to the
>>>> client application, the second would duplicate knowledge, leading to
>>>> maintainability issues (at least in the long term).
>>>
>>> Remember, when using Virtuoso you can conditionally apply inference rules
>>> via SPARQL, so your alignment triples become conditional and matrialize for
>>> query evaluation purposes only, when you leverage this aspect of Virtuoso.
>>> You don't have to fully materialize these triples.
>>>
>>>> Another point is, if we choose solution (1) (only vocabulary
>>>> alignments and no data)
>>>> - how we formally (where and with which vocabulary) connect the
>>>> dataset to its alignment?
>>>> - how machines would learn that my vocabulary, in some part, could be
>>>> interpreted as a variation of a more common set of terms?
>>>
>>> You can make all kinds of Linked Data description pages re. Virtuoso,
>>> maybe take a look at this Linked Data Deployment in 3 steps guide [1] to get
>>> a feel for how simple this has become.
>>>
>>> Links:
>>>
>>> 1.
>>> http://www.openlinksw.com/dataspace/kidehen@openlinksw.com/weblog/kidehen@openlinksw.com%27s%20BLOG%20%5B127%5D/1642
>>> -- how to simply load data into Virtuoso and start using Linked Data pages
>>> without hassles. Makes my comments clearer once you play around at bit.
>>>
>>> Kingsley
>>>>
>>>> Bests
>>>>
>>>> Enrico
>>>>
>>>> On 9 February 2011 16:12, Kingsley Idehen<kidehen@openlinksw.com>
>>>>  wrote:
>>>>>
>>>>> On 2/9/11 8:33 AM, Hugh Glaser wrote:
>>>>>>
>>>>>> Hi Aldo,
>>>>>> Nice stuff.
>>>>>> Regarding vocabulary alignment.
>>>>>> I would suggest you might want to keep it out of your dataset.
>>>>>> Vocabulary alignment is a matter of opinion; of course your dataset is
>>>>>> opinion as well, but it is the opinion of the organisation, whereas
>>>>>> the
>>>>>> vocabulary alignment you talk about might be somebody else's opinion..
>>>>>> Where the dataset owner agrees that, for example, dct:creator aligns
>>>>>> with
>>>>>> pubblicazioni:autore, then perhaps you can.
>>>>>> Of course, there is a question about why dct:creator was not used in
>>>>>> the
>>>>>> first place, but it can be neat to simply use all your own properties,
>>>>>> so
>>>>>> that's OK.
>>>>>> But if the alignment is to go in the dataset, it should be part of the
>>>>>> knowledge capture process, not added by a third party.
>>>>>>
>>>>>> In fact, the example you choose is great.
>>>>>> It is not at all clear to me that
>>>>>> cnr:coauthor rdfs:subPropertyOf foaf:knows
>>>>>> is actually what an organisation would want to say.
>>>>>> Even with the loosest meaning of foaf:knows, there will be co-authors
>>>>>> who
>>>>>> do not foaf:knows each other (certainly in some fields).
>>>>>> And some people would be upset that their organisation was publishing
>>>>>> data
>>>>>> stating that they did.
>>>>>> (I just checked the latest edition of Nature, and the two articles
>>>>>> each
>>>>>> have upwards of 50 authors from all over the world; I'm sure many of
>>>>>> them
>>>>>> have never communicated with each other, apart from this article.)
>>>>>> One of the advantages of using your own ontology is that you are never
>>>>>> saying anything other than what you meant (whatever that might be :-)
>>>>>> )
>>>>>>
>>>>>> So if a consumer of the data wanted to assert
>>>>>> cnr:coauthor rdfs:subPropertyOf foaf:knows
>>>>>> that is up to them and would be fine, but to enforce it seems not good
>>>>>> to
>>>>>> me.
>>>>>>
>>>>>> And to help them you might provide a separate document with the
>>>>>> alignments
>>>>>> in them, so that they can pick them up if they want.
>>>>>> And our policy is to do exactly the same with the identity management
>>>>>> thing as well, which is actually a similar problem (and I would be
>>>>>> happy to
>>>>>> discuss how to do that with you, but I think we would need to go
>>>>>> off-list
>>>>>> for that, as we have had many discussions on the list about it ;-) )
>>>>>>
>>>>>> I know I haven't tackled the technical issues much, which is what you
>>>>>> are
>>>>>> asking, but I always start at the socio :-)
>>>>>
>>>>> Aldo and colleagues,
>>>>>
>>>>> Congrats re. your project!
>>>>>
>>>>> In a nutshell, put the controversial stuff in its own Named Graph
>>>>> within
>>>>> your Virtuoso instance. When making Linked Data Resources (e.g. HTML
>>>>> browser
>>>>> pages) you can scope your SPARQL DESCRIBES or CONSTRUCTs to the main
>>>>> Graph
>>>>> (the one without an alignment triples etc..). The SPARQL endpoint stays
>>>>> as
>>>>> the open ended access point to all data.
>>>>>
>>>>> This area can get artificially confusing since DBMS architectures
>>>>> differ re.
>>>>> SPARQL databases that support RDF resource import and query access. I
>>>>> embarked on a somewhat similar exercise with @danbri last week re.
>>>>> DBpedia
>>>>> and Open Archives Movies. In this case it wasn't about alignments per
>>>>> se.,
>>>>> but the fundamental principles re. partitioning and scope control are
>>>>> ultimately the same.
>>>>>
>>>>> Links:
>>>>>
>>>>> 1. http://danbri.org/words/2011/02/01/658 -- post by Danbri about the
>>>>> exercise
>>>>> 2. http://kingsley.idehen.net/c/GOK2B -- actual PivotViewer page (click
>>>>> on
>>>>> "edit" to see the SPARQL behind and note how DBpedia and Danbri's
>>>>> Graphs are
>>>>> joined)
>>>>>
>>>>> Kingsley
>>>>>>
>>>>>> Best
>>>>>> Hugh
>>>>>>
>>>>>> On 9 Feb 2011, at 09:58, Aldo Gangemi wrote:
>>>>>>
>>>>>>> Dear all, we are happy to announce the release of the beta version of
>>>>>>> data.cnr.it and the Semantic Scout exploratory browser.
>>>>>>>
>>>>>>> data.cnr.it [1] is the linked open data version of the scientific
>>>>>>> data
>>>>>>> from the Italian National Research Council, and it includes
>>>>>>> researchers,
>>>>>>> institutes, research programmes, publications, topics, etc.
>>>>>>> A Virtuoso-powered SPARQL endpoint is available at [4]; a top-down
>>>>>>> browser is available at [5]; a voiD description is at [6].
>>>>>>>
>>>>>>> The Semantic Scout [2] is an experimental exploratory browser applied
>>>>>>> to
>>>>>>> the data.cnr.it datasets, cf. a paper published at EKAW2010 [3] for
>>>>>>> details.
>>>>>>>
>>>>>>> data.cnr.it and the Semantic Scout have been designed by the Semantic
>>>>>>> Technology Lab ([7], see [8] for credits) that comprises semantic web
>>>>>>> researchers and engineers from ISTC-CNR (the Institute of Cognitive
>>>>>>> Sciences
>>>>>>> and Technologies of the Italian National Research Council), and from
>>>>>>> the
>>>>>>> Information Systems Unit of the Italian National Research Council.
>>>>>>>
>>>>>>> We have used linked data principles, and the datasets are based on
>>>>>>> modular, pattern-based designed OWL ontologies [9]. Data have been
>>>>>>> triplified from multiple CNR databases, and enriched by means of OWL
>>>>>>> reasoning (ABox materialization and classification), as well as by
>>>>>>> NLP and
>>>>>>> graph mining techniques, e.g. the topics for the researchers have
>>>>>>> been
>>>>>>> learnt by an automatic categorization system that uses researchers'
>>>>>>> textual
>>>>>>> signatures (textual records) against the textual signature (pages) of
>>>>>>> DBpedia categories.
>>>>>>>
>>>>>>> Current work is on integrating a more robust identity management and
>>>>>>> its
>>>>>>> possible integration with Okkam, a deeper voiD description of the
>>>>>>> datasets,
>>>>>>> entity linking to other LOD datasets (e.g. DBLP), more vocabulary
>>>>>>> alignment
>>>>>>> (currently limited to FOAF, SKOS, and DC), etc.
>>>>>>>
>>>>>>> Regarding the last point, we are discussing the problem if vocabulary
>>>>>>> alignment should be reflected or not in the datasets by means of
>>>>>>> materialization. This problem has pervasive consequences on the size
>>>>>>> of the
>>>>>>> services vs. datasets that enable linked data consumption: any help
>>>>>>> from the
>>>>>>> community about pros and cons of either approaches? For example, if
>>>>>>> we
>>>>>>> declare (schema level):
>>>>>>>
>>>>>>> cnr:coauthor rdfs:subPropertyOf foaf:knows
>>>>>>> cnr:Researcher rdfs:subClassOf foaf:Person
>>>>>>>
>>>>>>> and we have e.g. in the data (*simplified names*):
>>>>>>>
>>>>>>> cnrdata:AldoGangemi cnr:coauthor cnrdata:EnricoDaga
>>>>>>> cnrdata:AldoGangemi rdf:type cnr:Researcher
>>>>>>>
>>>>>>> should we materialize an additional dataset containing e.g.:
>>>>>>>
>>>>>>> cnrdata:AldoGangemi foaf:knows cnrdata:EnricoDaga
>>>>>>> cnrdata:AldoGangemi rdf:type foaf:Person
>>>>>>>
>>>>>>> or should that be provided by a SPARQL endpoint under some entailment
>>>>>>> regime?
>>>>>>>
>>>>>>> Consider that this is not only a matter of SPARQL efficiency vs.
>>>>>>> amount
>>>>>>> of data, but also of data entanglement: e.g. when materialized, the
>>>>>>> topology
>>>>>>> of linked datasets would be severely complicated by the mutityping of
>>>>>>> individuals.
>>>>>>>
>>>>>>> Thanks for any advise (there not seems to be any best practice yet)
>>>>>>> Ciao
>>>>>>> Aldo, Enrico, Alberto
>>>>>>>
>>>>>>> [1] http://data.cnr.it
>>>>>>> [2] http://bit/ly/semanticscout
>>>>>>> [3] http://data.cnr.it/site/resources
>>>>>>> [4] http://data.cnr.it/sparql/
>>>>>>> [5] http://data.cnr.it/data/cnr/individuo/CNR
>>>>>>> [6] http://data.cnr.it/data/http://data.cnr.it/dataset/
>>>>>>> [7] http://stlab.istc.cnr.it
>>>>>>> [8] http://data.cnr.it/site/contacts
>>>>>>> [9] http://data.cnr.it/site/ontology
>>>>>>>
>>>>>>>
>>>>>>> _____________________________________
>>>>>>>
>>>>>>> Aldo Gangemi
>>>>>>> Senior Researcher
>>>>>>> Semantic Technology Lab (STLab)
>>>>>>> Institute for Cognitive Science and Technology,
>>>>>>> National Research Council (ISTC-CNR)
>>>>>>> Via Nomentana 56, 00161, Roma, Italy
>>>>>>> Tel: +390644161535
>>>>>>> Fax: +390644161513
>>>>>>> aldo.gangemi@cnr.it
>>>>>>> http://www.stlab.istc.cnr.it
>>>>>>> http://www.istc.cnr.it/createhtml.php?nbr=71
>>>>>>> skype aldogangemi
>>>>>>> okkam ID: http://www.okkam.org/entity/ok200707031186131660596
>>>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Regards,
>>>>>
>>>>> Kingsley Idehen
>>>>> President&    CEO
>>>>> OpenLink Software
>>>>> Web: http://www.openlinksw.com
>>>>> Weblog: http://www.openlinksw.com/blog/~kidehen
>>>>> Twitter/Identi.ca: kidehen
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
>
> --
>
> Regards,
>
> Kingsley Idehen
> President&  CEO
> OpenLink Software
> Web: http://www.openlinksw.com
> Weblog: http://www.openlinksw.com/blog/~kidehen
> Twitter/Identi.ca: kidehen
>
>
>
>
>
>



-- 
Enrico Daga
Technology Expert
--
National Research Council (CNR)
DCSPI-USI
P.le Aldo Moro 7 - Rome, Italy
Tel +39 4993 3321
--
Semantic Technology Laboratory (STLab)
http://stlab.istc.cnr.it/stlab/User:EnricoDaga
--
skype: enri-pan
Received on Tuesday, 15 February 2011 15:40:54 UTC