Re: ANN: LDIF - Linked Data Integration Framework V0.1 released. from Chris Bizer on 2011-07-01 (public-lod@w3.org from July 2011)

From: Chris Bizer <chris@bizer.de>
Date: Fri, 1 Jul 2011 08:34:06 +0200
To: "'Ruben Verborgh'" <ruben.verborgh@ugent.be>
Cc: "'public-lod'" <public-lod@w3.org>, "'Semantic Web'" <semantic-web@w3.org>, <semanticweb@yahoogroups.com>
Message-ID: <015e01cc37b8$e0a28840$a1e798c0$@de>

Hi Ruben,

> The important thing here is that the R2R patterns can be generated from 
> regular RDFS and OWL constructs (because these have a well-defined
meaning!), 
> while the other way round is difficult and impossible in general.
> If your (or anyone else's) software needs a different representation, 
> why not create it from RDF documents that use those Semantic Web
foundations 
> instead of forcing people to write those instructions?

For simple mappings that can be expressed using standard terms like
owl:equivalentClass, owl:equivalentProperty or rdfs:subClassOf,
rdfs:subPropertyOf we don't force people to write R2R syntax.

The R2R framework understands these constructs and when loading mappings
from a file or the Web, the framework simple rewrites these standard terms
into the equivalent internal R2R representation.

So we build on the existing standards, but just decided that for complex
mappings that require structural transformations and value transformation
functions we prefer a graph pattern based syntax.

Best,

Chris

-----Ursprüngliche Nachricht-----
Von: Ruben Verborgh [mailto:ruben.verborgh@ugent.be] 
Gesendet: Freitag, 1. Juli 2011 07:49
An: Chris Bizer
Cc: 'public-lod'; 'Semantic Web'; semanticweb@yahoogroups.com
Betreff: Re: ANN: LDIF - Linked Data Integration Framework V0.1 released.

Hi Chris,

Sounds like a challenge indeed :)
Thanks for bringing this to my attention.

While we have a lot of experience with reasoning, we never tried to go to
the billions. I contacted Jos De Roo, the author of the EYE reasoner, to see
what would be possible. I think we might at least be able to perform some
interesting stuff.

Note however that performance is a separate issue from what I was saying
before. No matter how good the LDIF Hadoop implementation will perform (and
I am curious to find out!), for me, it doesn't justify creating a whole new
semantics.
The important thing here is that the R2R patterns can be generated from
regular RDFS and OWL constructs (because these have a well-defined
meaning!), while the other way round is difficult and impossible in general.
If your (or anyone else's) software needs a different representation, why
not create it from RDF documents that use those Semantic Web foundations
instead of forcing people to write those instructions?
Reuse is so important in our community, and while software will someday be
able to bring a lot of data together, humans will always be responsible for
getting things right at the very base.

Cheers,

Ruben

On 30 Jun 2011, at 22:34, Chris Bizer wrote:

> Hi Ruben,
> 
>> Thanks for the fast and detailed reply, it's a very interesting
> discussion.
>> 
>> Indeed, there are several ways for mapping and identity resolution.
>> But what strikes me is that people in the community seem to be
> insufficiently aware 
>> of the possibilities and performance of current reasoners.
> 
> Possibly. But luckily we are today in the position to just give it a try.
> 
> So an idea with my Semantic Web Challenge hat on:
> 
> Why not take the Billion Triples 2011 data set
> (http://challenge.semanticweb.org/) which consists of 2 billion triples
that
> have been recently crawled from the Web and try to find all data in the
> dataset about authors and their publications, map this data to a single
> target schema and merge all duplicates.
> 
> Our current LDIF in-memory implementation is not capable of doing this as
2
> billion triples are too much data for it. But with the planned
Hadoop-based
> implementation we are hoping to get into this range.
> 
> It would be very interesting if somebody else would try to solve the task
> above using a reasoned-based approach and we could then compare the number
> of authors and publications identified as well as the duration of the data
> integration process.
> 
> Anybody interested?
> 
> Cheers,
> 
> Chris

Received on Friday, 1 July 2011 06:34:52 UTC