W3C home > Mailing lists > Public > public-xg-rdb2rdf@w3.org > November 2008

StateOfTheArt

From: Sebastian Hellmann <hellmann@informatik.uni-leipzig.de>
Date: Mon, 10 Nov 2008 15:12:57 +0100
Message-ID: <49184169.2090708@informatik.uni-leipzig.de>
To: public-xg-rdb2rdf@w3.org

Dear All,
I recently reviewed the StateOfTheArt Wiki Page and found it puzzling at 
times. I already changed some parts:

* Reorganized summary of literature survey. There were lots of entries 
in categories named "Other". I moved R2O, Sahoo et al. and Dartgrid to 
the Domain-Semantics section. R2O and ODEMapster are similar to D2RQ, 
which also is a mapping language. Both can be used manually to model 
Domain-Semantics. R2O even more so, since it requires a pre-existing 
Domain Ontology. Dartgrid is similar to Hu et. al. as it provides a 
visual aligment tool. The target of Sahoo et. al. is answering questions 
with the help of SPARQL, but the technique used is ETL (correct me if, 
I'm mistaken), so I also moved it to Domain-Semantics.
The work of Chebotko is imho completely out of scope as he is concerned 
with SPARQL-to-SQL rewriting for triple stores, which are already in 
RDF. I think the reference can be removed completely.

* I removed the table criteria Query Implementation, as it is 
misleading. It can be merged with mapping implementation. Some entries 
where of the form "static"(ETL) and had "SPARQL" as "query 
implementation". Once ETL is performed it can naturally be loaded in a 
triple store and queried with SPARQL, also an On-Demand Query-Driven 
approach can easily produce an RDF Dump. The main criteria here should 
be if the data is retrieved on the fly from the database or just 
transformed once.
The "Data Integration" criteria for the table doesn't really distinguish 
much, since all approaches certainly aim at integrating data (into the 
Semantic Web). A more important criteria would be, if approaches 1. need 
a pre-exisiting ontology 2. go beyond database-semantics in the 
direction of domain-semantics or 3. if they are used in real projects, 
that successfully integrate more than one database.

Proposal for classification of literature:
There seem to be 4 classes, which the literature can be divided in:
1. Schema/ontology Alignment:
Hu et al., Dartgrid. Both try to create an alignment from an DB-schema 
to an existing ontology. Related Work in this direction is very numerous 
just to mention Coma++[1].
2. Database Mining
Li, DB2OWL, RDBToOnto, Tirmizi all start from the existing database and 
try to extract as much information as possible from the database schema.
They also stop there, which means they do not use any external sources 
such as existing domain ontologies.
3. Integration/Domain Semantics
Sahoo et al. mainly concerned with modeling domain semantics correctly.
4. Languages/Servers
D2RQ, R2O, RDF Views, Asio Tools, all have their own language and they 
all provide means to model domain semantics, but most often manually.


Hope I could help. I can also offer to restructure the StateOfTheArt 
Wiki page myself, but didn't dare to make such changes on my own account.
Regards,
Sebastian Hellmann

[1] http://dbs.uni-leipzig.de/de/Research/coma.html

-- 
http://bis.informatik.uni-leipzig.de/SebastianHellmann
Received on Tuesday, 11 November 2008 15:26:12 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 19:39:03 UTC