- From: Kingsley Idehen <kidehen@openlinksw.com>
- Date: Wed, 09 Feb 2011 17:31:44 -0500
- To: bvilla@delicias.dia.fi.upm.es
- CC: Aldo Gangemi <aldo.gangemi@cnr.it>, enrico.daga@cnr.it, Hugh Glaser <hg@ecs.soton.ac.uk>, Linked Data community <public-lod@w3.org>, Alberto Salvati <alberto.salvati@cnr.it>
On 2/9/11 5:01 PM, Boris Villazón Terrazas wrote: > Hi Aldo et al. > > Nice stuff! ;-) > > Regarding your question, I can tell you what we did within the context > of GeoLinkedData [1] > We have separated the model/vocabulary from the data. So we have the > model/vocabulary in a Named Graph and the data in other Named Graph. > According to Kingsley it seems to be we are going to the right > direction .... Thanks Kingsley > > We have similar cases as you have, for example > geoes:Provincia rdfs:subClassOf fao:territory [2] > and a particular resource of type geoes:Provincia can be geores:Madrid > [3] > > We only materialize the instances of geoes:Provincia, we do not for > fao:territory. > If I understand correctly to what Kingsley says, we can conditionally > apply inference rules via SPARQL, so our alignment triples become > conditional and matrialize for query evaluation purposes only, using > Virtuoso, right? This is something that we have to check. Yes! Kingsley > > Best > > Boris > > P.S. BTW you can generate your sitemap files from your sparql endpoint > using sitemap4rdf [4] and then submit them to Google and Sindice ;-) > > > [1] http://geo.linkeddata.es/ > [2] http://geo.linkeddata.es/ontology/Provincia > [3] http://geo.linkeddata.es/resource/Provincia/Madrid > [4] http://lab.linkeddata.deri.ie/2010/sitemap4rdf/ > > On 09/02/2011 19:17, Kingsley Idehen wrote: >> On 2/9/11 12:57 PM, Enrico Daga wrote: >>> Dear Hugh, Kingsley, all, >>> >>> thank you both for your hints. >>> >>>>> Where the dataset owner agrees that, for example, dct:creator >>>>> aligns with >>>>> pubblicazioni:autore, then perhaps you can. >>>>> Of course, there is a question about why dct:creator was not used >>>>> in the >>>>> first place, but it can be neat to simply use all your own >>>>> properties, so >>>>> that's OK. >>>>> But if the alignment is to go in the dataset, it should be part of >>>>> the >>>>> knowledge capture process, not added by a third party. >>> In our case we are the maintainer of the dataset and ontology so we >>> know that a triple like >>> >>> pubblicazioni:autore rdfs:subClassOf dct:creator >>> >>> is correct. >> >> Yes, but "correct" is one of those subjective things when working at >> InterWeb scale. Thus, its important that you partition your data >> using Named Graphs rather than work with a single graph. >> >> The approach above allows you to see things as you seek, while >> letting others do the same via their specific "context lenses". >> >>> Our point is mainly related to make the dataset easily reusable by the >>> means of shared vocabularies even if those commonly known names have >>> not been used in the process of dataset generation. >> >> Yes, but if you have the vocabulary triples (TBox) in a separate name >> graph you're fine. Or you can leave everything in your main graph, >> but place any inter vocabulary mapping triples in a separate Named >> Graph. >> >>> We want our data to be self-explained providing suggested alignments >>> between our internal vocabulary and public ones, at least for very >>> common cases, such "abstract:Titolo" and "dc:title", for example. >> >> Yes, this is all clear. The key is to partition your data, there's no >> downside bar deflection of barbs from those who see things >> differently due to their specific "context lenses" when dealing with >> your data. >> >>>>> So if a consumer of the data wanted to assert >>>>> cnr:coauthor rdfs:subPropertyOf foaf:knows >>>>> that is up to them and would be fine, but to enforce it seems not >>>>> good to >>>>> me. >>> Yes, in this case the alignment implies additional assumptions, but in >>> principle we need this (maybe not exactly that...) to describe the >>> dataset to non-cnr people. >> >> Again, that's fine, put the triples in a separate Named Graph. It >> won't adversely affect anything. >> >>>> In a nutshell, put the controversial stuff in its own Named Graph >>>> within >>>> your Virtuoso instance. When making Linked Data Resources (e.g. >>>> HTML browser >>>> pages) you can scope your SPARQL DESCRIBES or CONSTRUCTs to the >>>> main Graph >>>> (the one without an alignment triples etc..). The SPARQL endpoint >>>> stays as >>>> the open ended access point to all data. >>> So you suggest to use a separate graph, not involved in content >>> negotiation but accessible through the sparql endpoint. >> >> I mean: >> >> 1. Your HTML pages (which use content negotiation and SPARQL DESCRIBE >> or CONSTRUCTS) to make Description Page can be scoped to the entire >> quad store or specific Named Graphs >> >> 2. SPARQL endpoint is always open for people to query the entire >> collection of graphs or specific Named Graph combos. >> >> You have to decide how you want to project your world view to the >> public. Bottom line, the public always has a SPARQL endpoint to they >> can apply their specific "context lenses" assuming you choose to have >> you world view (including cross vocabulary mappings) exposed in your >> Linked Data pages. >> >>> This solution could be good, but brings more/new questions. >>> Let's say we create a new dataset<http://data.cnr.it/alignments>, >>> what should it return? >>> 1) alignments at the schema level between the CNR ontology and public >>> well-known vocabularies, triples like "pubblicazioni:autore >>> rdfs:subClassOf dct:creator" >>> 2) the above plus materialized triples, for example: >>> >>>>>> cnrdata:AldoGangemi foaf:knows cnrdata:EnricoDaga >>>>>> cnrdata:AldoGangemi rdf:type foaf:Person >>> The first would leave the interpretation of the alignment to the >>> client application, the second would duplicate knowledge, leading to >>> maintainability issues (at least in the long term). >> >> Remember, when using Virtuoso you can conditionally apply inference >> rules via SPARQL, so your alignment triples become conditional and >> matrialize for query evaluation purposes only, when you leverage this >> aspect of Virtuoso. You don't have to fully materialize these triples. >> >>> Another point is, if we choose solution (1) (only vocabulary >>> alignments and no data) >>> - how we formally (where and with which vocabulary) connect the >>> dataset to its alignment? >>> - how machines would learn that my vocabulary, in some part, could be >>> interpreted as a variation of a more common set of terms? >> >> You can make all kinds of Linked Data description pages re. Virtuoso, >> maybe take a look at this Linked Data Deployment in 3 steps guide [1] >> to get a feel for how simple this has become. >> >> Links: >> >> 1. >> http://www.openlinksw.com/dataspace/kidehen@openlinksw.com/weblog/kidehen@openlinksw.com%27s%20BLOG%20%5B127%5D/1642 >> -- how to simply load data into Virtuoso and start using Linked Data >> pages without hassles. Makes my comments clearer once you play around >> at bit. >> >> Kingsley >>> Bests >>> >>> Enrico >>> >>> On 9 February 2011 16:12, Kingsley Idehen<kidehen@openlinksw.com> >>> wrote: >>>> On 2/9/11 8:33 AM, Hugh Glaser wrote: >>>>> Hi Aldo, >>>>> Nice stuff. >>>>> Regarding vocabulary alignment. >>>>> I would suggest you might want to keep it out of your dataset. >>>>> Vocabulary alignment is a matter of opinion; of course your >>>>> dataset is >>>>> opinion as well, but it is the opinion of the organisation, >>>>> whereas the >>>>> vocabulary alignment you talk about might be somebody else's opinion. >>>>> Where the dataset owner agrees that, for example, dct:creator >>>>> aligns with >>>>> pubblicazioni:autore, then perhaps you can. >>>>> Of course, there is a question about why dct:creator was not used >>>>> in the >>>>> first place, but it can be neat to simply use all your own >>>>> properties, so >>>>> that's OK. >>>>> But if the alignment is to go in the dataset, it should be part of >>>>> the >>>>> knowledge capture process, not added by a third party. >>>>> >>>>> In fact, the example you choose is great. >>>>> It is not at all clear to me that >>>>> cnr:coauthor rdfs:subPropertyOf foaf:knows >>>>> is actually what an organisation would want to say. >>>>> Even with the loosest meaning of foaf:knows, there will be >>>>> co-authors who >>>>> do not foaf:knows each other (certainly in some fields). >>>>> And some people would be upset that their organisation was >>>>> publishing data >>>>> stating that they did. >>>>> (I just checked the latest edition of Nature, and the two articles >>>>> each >>>>> have upwards of 50 authors from all over the world; I'm sure many >>>>> of them >>>>> have never communicated with each other, apart from this article.) >>>>> One of the advantages of using your own ontology is that you are >>>>> never >>>>> saying anything other than what you meant (whatever that might be >>>>> :-) ) >>>>> >>>>> So if a consumer of the data wanted to assert >>>>> cnr:coauthor rdfs:subPropertyOf foaf:knows >>>>> that is up to them and would be fine, but to enforce it seems not >>>>> good to >>>>> me. >>>>> >>>>> And to help them you might provide a separate document with the >>>>> alignments >>>>> in them, so that they can pick them up if they want. >>>>> And our policy is to do exactly the same with the identity management >>>>> thing as well, which is actually a similar problem (and I would be >>>>> happy to >>>>> discuss how to do that with you, but I think we would need to go >>>>> off-list >>>>> for that, as we have had many discussions on the list about it ;-) ) >>>>> >>>>> I know I haven't tackled the technical issues much, which is what >>>>> you are >>>>> asking, but I always start at the socio :-) >>>> Aldo and colleagues, >>>> >>>> Congrats re. your project! >>>> >>>> In a nutshell, put the controversial stuff in its own Named Graph >>>> within >>>> your Virtuoso instance. When making Linked Data Resources (e.g. >>>> HTML browser >>>> pages) you can scope your SPARQL DESCRIBES or CONSTRUCTs to the >>>> main Graph >>>> (the one without an alignment triples etc..). The SPARQL endpoint >>>> stays as >>>> the open ended access point to all data. >>>> >>>> This area can get artificially confusing since DBMS architectures >>>> differ re. >>>> SPARQL databases that support RDF resource import and query access. I >>>> embarked on a somewhat similar exercise with @danbri last week re. >>>> DBpedia >>>> and Open Archives Movies. In this case it wasn't about alignments >>>> per se., >>>> but the fundamental principles re. partitioning and scope control are >>>> ultimately the same. >>>> >>>> Links: >>>> >>>> 1. http://danbri.org/words/2011/02/01/658 -- post by Danbri about the >>>> exercise >>>> 2. http://kingsley.idehen.net/c/GOK2B -- actual PivotViewer page >>>> (click on >>>> "edit" to see the SPARQL behind and note how DBpedia and Danbri's >>>> Graphs are >>>> joined) >>>> >>>> Kingsley >>>>> Best >>>>> Hugh >>>>> >>>>> On 9 Feb 2011, at 09:58, Aldo Gangemi wrote: >>>>> >>>>>> Dear all, we are happy to announce the release of the beta >>>>>> version of >>>>>> data.cnr.it and the Semantic Scout exploratory browser. >>>>>> >>>>>> data.cnr.it [1] is the linked open data version of the scientific >>>>>> data >>>>>> from the Italian National Research Council, and it includes >>>>>> researchers, >>>>>> institutes, research programmes, publications, topics, etc. >>>>>> A Virtuoso-powered SPARQL endpoint is available at [4]; a top-down >>>>>> browser is available at [5]; a voiD description is at [6]. >>>>>> >>>>>> The Semantic Scout [2] is an experimental exploratory browser >>>>>> applied to >>>>>> the data.cnr.it datasets, cf. a paper published at EKAW2010 [3] >>>>>> for details. >>>>>> >>>>>> data.cnr.it and the Semantic Scout have been designed by the >>>>>> Semantic >>>>>> Technology Lab ([7], see [8] for credits) that comprises semantic >>>>>> web >>>>>> researchers and engineers from ISTC-CNR (the Institute of >>>>>> Cognitive Sciences >>>>>> and Technologies of the Italian National Research Council), and >>>>>> from the >>>>>> Information Systems Unit of the Italian National Research Council. >>>>>> >>>>>> We have used linked data principles, and the datasets are based on >>>>>> modular, pattern-based designed OWL ontologies [9]. Data have been >>>>>> triplified from multiple CNR databases, and enriched by means of OWL >>>>>> reasoning (ABox materialization and classification), as well as >>>>>> by NLP and >>>>>> graph mining techniques, e.g. the topics for the researchers have >>>>>> been >>>>>> learnt by an automatic categorization system that uses >>>>>> researchers' textual >>>>>> signatures (textual records) against the textual signature >>>>>> (pages) of >>>>>> DBpedia categories. >>>>>> >>>>>> Current work is on integrating a more robust identity management >>>>>> and its >>>>>> possible integration with Okkam, a deeper voiD description of the >>>>>> datasets, >>>>>> entity linking to other LOD datasets (e.g. DBLP), more vocabulary >>>>>> alignment >>>>>> (currently limited to FOAF, SKOS, and DC), etc. >>>>>> >>>>>> Regarding the last point, we are discussing the problem if >>>>>> vocabulary >>>>>> alignment should be reflected or not in the datasets by means of >>>>>> materialization. This problem has pervasive consequences on the >>>>>> size of the >>>>>> services vs. datasets that enable linked data consumption: any >>>>>> help from the >>>>>> community about pros and cons of either approaches? For example, >>>>>> if we >>>>>> declare (schema level): >>>>>> >>>>>> cnr:coauthor rdfs:subPropertyOf foaf:knows >>>>>> cnr:Researcher rdfs:subClassOf foaf:Person >>>>>> >>>>>> and we have e.g. in the data (*simplified names*): >>>>>> >>>>>> cnrdata:AldoGangemi cnr:coauthor cnrdata:EnricoDaga >>>>>> cnrdata:AldoGangemi rdf:type cnr:Researcher >>>>>> >>>>>> should we materialize an additional dataset containing e.g.: >>>>>> >>>>>> cnrdata:AldoGangemi foaf:knows cnrdata:EnricoDaga >>>>>> cnrdata:AldoGangemi rdf:type foaf:Person >>>>>> >>>>>> or should that be provided by a SPARQL endpoint under some >>>>>> entailment >>>>>> regime? >>>>>> >>>>>> Consider that this is not only a matter of SPARQL efficiency vs. >>>>>> amount >>>>>> of data, but also of data entanglement: e.g. when materialized, >>>>>> the topology >>>>>> of linked datasets would be severely complicated by the >>>>>> mutityping of >>>>>> individuals. >>>>>> >>>>>> Thanks for any advise (there not seems to be any best practice yet) >>>>>> Ciao >>>>>> Aldo, Enrico, Alberto >>>>>> >>>>>> [1] http://data.cnr.it >>>>>> [2] http://bit/ly/semanticscout >>>>>> [3] http://data.cnr.it/site/resources >>>>>> [4] http://data.cnr.it/sparql/ >>>>>> [5] http://data.cnr.it/data/cnr/individuo/CNR >>>>>> [6] http://data.cnr.it/data/http://data.cnr.it/dataset/ >>>>>> [7] http://stlab.istc.cnr.it >>>>>> [8] http://data.cnr.it/site/contacts >>>>>> [9] http://data.cnr.it/site/ontology >>>>>> >>>>>> >>>>>> _____________________________________ >>>>>> >>>>>> Aldo Gangemi >>>>>> Senior Researcher >>>>>> Semantic Technology Lab (STLab) >>>>>> Institute for Cognitive Science and Technology, >>>>>> National Research Council (ISTC-CNR) >>>>>> Via Nomentana 56, 00161, Roma, Italy >>>>>> Tel: +390644161535 >>>>>> Fax: +390644161513 >>>>>> aldo.gangemi@cnr.it >>>>>> http://www.stlab.istc.cnr.it >>>>>> http://www.istc.cnr.it/createhtml.php?nbr=71 >>>>>> skype aldogangemi >>>>>> okkam ID: http://www.okkam.org/entity/ok200707031186131660596 >>>>>> >>>> >>>> -- >>>> >>>> Regards, >>>> >>>> Kingsley Idehen >>>> President& CEO >>>> OpenLink Software >>>> Web: http://www.openlinksw.com >>>> Weblog: http://www.openlinksw.com/blog/~kidehen >>>> Twitter/Identi.ca: kidehen >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>> >>> >> >> > > -- Regards, Kingsley Idehen President& CEO OpenLink Software Web: http://www.openlinksw.com Weblog: http://www.openlinksw.com/blog/~kidehen Twitter/Identi.ca: kidehen
Received on Wednesday, 9 February 2011 22:34:21 UTC