Re: ANN: LDIF - Linked Data Integration Framework V0.1 released. from Ruben Verborgh on 2011-06-30 (semantic-web@w3.org from June 2011)

From: Ruben Verborgh <ruben.verborgh@ugent.be>
Date: Thu, 30 Jun 2011 11:49:16 +0200
To: "Chris Bizer" <chris@bizer.de>
Cc: "'public-lod'" <public-lod@w3.org>, "'Semantic Web'" <semantic-web@w3.org>, <semanticweb@yahoogroups.com>
Message-Id: <ACD619A6-20E9-4F71-A233-BD31062B4F1E@ugent.be>
Hi Chris,

Thanks for the fast and detailed reply, it's a very interesting discussion.

Indeed, there are several ways for mapping and identity resolution.
But what strikes me is that people in the community seem to be insufficiently aware of the possibilities and performance of current reasoners.

> As you can see the data translation requires lots of structural
> transformations as well as complex property value transformations using
> various functions. All things where current logical formalisms are not very
> good at.

Oh yes, they are. All needed transformations in your paper can be performed by at least two reasoners: cwm [1] and EYE [2] by using built-ins [3]. Include are regular expressions, datatype transforms…
Frankly, every transform in the R2R example can be expressed as an N3 rule.

> If I as a application developer
> want to get a job done, what does it help me if I can exchange mappings
> between different tools that all don't get the job done?

Because different tools can contribute different results, and if you use a common language and idiom, they all can work with the same data and metadata.

> more and more developers know SPARQL which makes it easier for them to learn R2R.

The developers that know SPARQL is a proper subset of those that know plain RDF, which is what I suggest using. And even if rules are necessary, N3 is only a small extension of RDF.

> Benchmark we have the feeling that SPARQL engines are more suitable for
> this task then current reasoning engines due to their performance problems
> as well as problems to deal with inconsistent data. 

The extremely solid performance [4] of EYE is too little known. It can achieve things in linear time that other reasoners can never solve.

But my main point is semantics. Why make a new system with its own meanings and interpretations, when there is so much to do with plain RDF and its widely known vocabularies (RDFS, OWL)?
Ironically, a tool which contributes to the reconciliation of different RDF sources, does not use common vocabularies to express well-known relationships.

Cheers,

Ruben

[1] http://www.w3.org/2000/10/swap/doc/cwm.html
[2] http://eulersharp.sourceforge.net/
[3] http://www.w3.org/2000/10/swap/doc/CwmBuiltins
[4] http://eulersharp.sourceforge.net/2003/03swap/dtb-2010.txt

On 30 Jun 2011, at 10:51, Chris Bizer wrote:

> Hi Ruben,
> 
> thank you for your detailed feedback.
> 
> Of course it is always a question of taste how you prefer to express data
> translation rules and I agree that simple mappings can also be expressed
> using standard OWL constructs.
> 
> When designing the R2R mapping language, we first analyzed the real-world
> requirements that arise if you try to properly integrate data from existing
> Linked Data on the Web. We summarize our findings in Section 5 of the
> following paper
> http://www.wiwiss.fu-berlin.de/en/institute/pwo/bizer/research/publications/
> BizerSchultz-COLD-R2R-Paper.pdf
> As you can see the data translation requires lots of structural
> transformations as well as complex property value transformations using
> various functions. All things where current logical formalisms are not very
> good at. 
> 
> Others reasons why we choose to base the mapping language on SPARQL where
> that:
> 
> 1. more and more developers know SPARQL which makes it easier for them to
> learn R2R.
> 2. we to be able to translate large amounts (billions of triples in the
> mid-term) of messy inconsistent Web data and from our experience with the
> BSBM Benchmark we have the feeling that SPARQL engines are more suitable for
> this task then current reasoning engines due to their performance problems
> as well as problems to deal with inconsistent data. 
> 
> I disagree with you that R2R mappings are not suitable for being exchanged
> on the Web. In contrast they were especially designed for being published
> and discovered on the Web and allow partial mappings from different sources
> to be easily combined (see paper above for details about this).
> 
> I think your argument about the portability of mappings between different
> tools currently is only partially valid. If I as a application developer
> want to get a job done, what does it help me if I can exchange mappings
> between different tools that all don't get the job done?
> 
> Also note, that we aim with LDIF to provide for identity resolution in
> addition to schema mapping. It is well known that identity resolution in
> practical setting requires rather complex matching heuristics (see Silk
> papers for details about different matchers that are usually employed) and
> identity resolution is again a topic where reasoning engines don't have too
> much to offer.
> 
> But again, there are different ways and tastes about how to express mapping
> rules and identity resolution heuristics. R2R and Silk LSL are our
> approaches to getting the job done and we are of course happy if other
> people provide working solutions for the task of integrating and cleansing
> messy data from the Web of Linked Data and are happy to compare our approach
> with theirs.
> 
> Cheers,
> 
> Chris
Received on Thursday, 30 June 2011 09:50:02 UTC