- From: Gianluca Correndo <gc3@ecs.soton.ac.uk>
- Date: Thu, 30 Jun 2011 17:39:22 +0100
- To: public-lod@w3.org, "'Semantic Web'" <semantic-web@w3.org>, semanticweb@yahoogroups.com
Hi, I thought I could share some remarks on the topic. First thing; well done for the release of the LDIF, it's an interesting piece of work and it's dearly needed. I started to release a bit of my work too, although it's in a very early stage (https://github.com/correndo/mediation). On 6/30/11 10:49 AM, Ruben Verborgh wrote: > Hi Chris, > > Thanks for the fast and detailed reply, it's a very interesting discussion. > > Indeed, there are several ways for mapping and identity resolution. > But what strikes me is that people in the community seem to be insufficiently aware of the possibilities and performance of current reasoners. About the identity resolution. Silk it's a nice framework for discovery identities' equivalents although I think that for the fruition of such equivalences a more distributed approach should be preferred. An approach where the links among entities are discovered (no matter with what tool) and *shared* could be more organic to an architecture of distributed data publishing. About the reasoners. I guess on this issue one could distinguish on where a given reasoner is applied. Within Linked Data, where the amount of data is assumed to be huge, the application of a reasoner is usually felt as not applicable. They just don't scale as well as one would like, although some triple stores are having good performances (owlim, 4sr and others). > >> As you can see the data translation requires lots of structural >> transformations as well as complex property value transformations using >> various functions. All things where current logical formalisms are not very >> good at. > > Oh yes, they are. All needed transformations in your paper can be performed by at least two reasoners: cwm [1] and EYE [2] by using built-ins [3]. Include are regular expressions, datatype transforms… > Frankly, every transform in the R2R example can be expressed as an N3 rule. Logic formalisms can be applied to data structural transformation, although it sounds a bit of an overkill. I think the real issue here is to find the right tool for the right job. If we have heavyweight ontologies that differ conceptually one another then a reasoner is the right tool. But what if we're dealing more with different data schema that don't require a complex reasoning? There are, I think, two different levels that can be aligned by two different formalisms: RDF, and OWL. Aligning RDF graphs it's something that has little to do with description logics, the semantics it's inscribed in the structure and structure alignments are therefore called for. A preliminary work I published is [1] was based on graph rewriting, but it handles query rewriting and it was think to be a lightweight approach (schema alignment). On the use of patterns literals.. It's a bit of using RDF for describing a string whose content's semantics is defined elsewhere. It just doesn't sound right, but again, even using RDF and reification for describing at least the basic graph patterns [1], doesn't solve the problem of semantic elicitation. The interpretation of an alignment is still relative to a particular tool. So, instead of writing literals like this: mp:Gene r2r:sourcePattern "?SUBJ a genes:gene"; r2r:targetPattern "?SUBJ a smwcat:Gene". I would have written a chunk of RDF pattern graph like this: mediation:lhs [ a rdf:Statement ; rdf:subject _:SUBJ ; rdf:predicate rdf:type ; rdf:objectgenes:gene .] mediation:rhs [ a rdf:Statement ; rdf:subject _:SUBJ ; rdf:predicate rdf:type ; rdf:objectsmwcat:Gene.] For aligning OWL ontologies there have been a number of proposals, EDOAL [2], C-OWL [3] to name a few not considering the already mentioned properties described in OWL (owl:sameAs, owl:equivalentProperty, owl:equivalentClass). The question for any formalism for OWL alignment formalisms is more to find different profiles of complexity that fit different application cases. [1] http://eprints.ecs.soton.ac.uk/18370/ [2] http://alignapi.gforge.inria.fr/edoal.html [3] http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.73.9326&rep=rep1&type=pdf > >> If I as a application developer >> want to get a job done, what does it help me if I can exchange mappings >> between different tools that all don't get the job done? > > Because different tools can contribute different results, and if you use a common language and idiom, they all can work with the same data and metadata. > >> more and more developers know SPARQL which makes it easier for them to learn R2R. > > The developers that know SPARQL is a proper subset of those that know plain RDF, which is what I suggest using. And even if rules are necessary, N3 is only a small extension of RDF. > >> Benchmark we have the feeling that SPARQL engines are more suitable for >> this task then current reasoning engines due to their performance problems >> as well as problems to deal with inconsistent data. > > The extremely solid performance [4] of EYE is too little known. It can achieve things in linear time that other reasoners can never solve. > > But my main point is semantics. Why make a new system with its own meanings and interpretations, when there is so much to do with plain RDF and its widely known vocabularies (RDFS, OWL)? > Ironically, a tool which contributes to the reconciliation of different RDF sources, does not use common vocabularies to express well-known relationships. > > Cheers, > > Ruben > > [1] http://www.w3.org/2000/10/swap/doc/cwm.html > [2] http://eulersharp.sourceforge.net/ > [3] http://www.w3.org/2000/10/swap/doc/CwmBuiltins > [4] http://eulersharp.sourceforge.net/2003/03swap/dtb-2010.txt > > On 30 Jun 2011, at 10:51, Chris Bizer wrote: > >> Hi Ruben, >> >> thank you for your detailed feedback. >> >> Of course it is always a question of taste how you prefer to express data >> translation rules and I agree that simple mappings can also be expressed >> using standard OWL constructs. >> >> When designing the R2R mapping language, we first analyzed the real-world >> requirements that arise if you try to properly integrate data from existing >> Linked Data on the Web. We summarize our findings in Section 5 of the >> following paper >> http://www.wiwiss.fu-berlin.de/en/institute/pwo/bizer/research/publications/ >> BizerSchultz-COLD-R2R-Paper.pdf >> As you can see the data translation requires lots of structural >> transformations as well as complex property value transformations using >> various functions. All things where current logical formalisms are not very >> good at. >> >> Others reasons why we choose to base the mapping language on SPARQL where >> that: >> >> 1. more and more developers know SPARQL which makes it easier for them to >> learn R2R. >> 2. we to be able to translate large amounts (billions of triples in the >> mid-term) of messy inconsistent Web data and from our experience with the >> BSBM Benchmark we have the feeling that SPARQL engines are more suitable for >> this task then current reasoning engines due to their performance problems >> as well as problems to deal with inconsistent data. >> >> I disagree with you that R2R mappings are not suitable for being exchanged >> on the Web. In contrast they were especially designed for being published >> and discovered on the Web and allow partial mappings from different sources >> to be easily combined (see paper above for details about this). >> >> I think your argument about the portability of mappings between different >> tools currently is only partially valid. If I as a application developer >> want to get a job done, what does it help me if I can exchange mappings >> between different tools that all don't get the job done? >> >> Also note, that we aim with LDIF to provide for identity resolution in >> addition to schema mapping. It is well known that identity resolution in >> practical setting requires rather complex matching heuristics (see Silk >> papers for details about different matchers that are usually employed) and >> identity resolution is again a topic where reasoning engines don't have too >> much to offer. >> >> But again, there are different ways and tastes about how to express mapping >> rules and identity resolution heuristics. R2R and Silk LSL are our >> approaches to getting the job done and we are of course happy if other >> people provide working solutions for the task of integrating and cleansing >> messy data from the Web of Linked Data and are happy to compare our approach >> with theirs. >> >> Cheers, >> >> Chris > > -- ****************************************** Gianluca Correndo Research fellow IAM group Electronic and Computer Science University of Southampton ******************************************
Received on Friday, 1 July 2011 08:33:47 UTC