[Fwd: Re: Fwd: StateOfTheArt Survey]

Forwarding to public mailing list.\
Please use this list for all technical discussion.
Ashok

-------- Original Message --------
Subject:  Re: Fwd: StateOfTheArt Survey
Date:  Thu, 20 Nov 2008 11:42:07 +0100
From:  Sebastian Hellmann <hellmann@informatik.uni-leipzig.de>
To:  Satya Sahoo <sahoo.2@wright.edu>, Wolgang Halb 
<Wolfgang.Halb@joanneum.at>
CC:  Ashok Malhotra <ashok.malhotra@oracle.com>, Sören Auer 
<auer@informatik.uni-leipzig.de>
References:  <6800d6a74142.49247e31@wright.edu>



Hello all,
I couldn't send this response to member-xg-rdb2rdf@w3.org, as I'm not a 
member. It contains a response to Satyas response and some minor 
comments on the state of the art document.

-------- Original-Nachricht --------
Betreff:  StateOfTheArt Survey
Datum:  Sun, 16 Nov 2008 19:08:48 -0500
Von:  Satya Sahoo <sahoo.2@wright.edu>
An:  hellmann@informatik.uni-leipzig.de
CC:  Wolfgang.Halb@joanneum.at, member-xg-rdb2rdf@w3.org

> My comments: I agree with the re-organization and have updated the 
> survey document to reflect these changes.
> But, I believe the work of Chebotko is relevant since many application 
> do tranform SPARQL to SQL and one of the
> important issues in preserving the semantics of the SPARQL query. For 
> example,
> the SquirrelRDF extends the ARQ query engine to convert a basic graph 
> pattern to SQL
> using the Table-to-Class approach. For now, I have classified the The 
> Chebotko work as
> "Tools/Application" (with reference to your "classification of 
> literature" point later).

The field "preserving SPARQL semantics to SQL" is a very wide field.
Although there might be some insight gained from this, it doesn't seem 
worth the effort to get into detail there. Almost every Triple store has 
its own rewriting engine (RAP, Jena, Virtuoso). It can hardly be 
compared to Squirrel RDF or Relational.OWL or anything. I personally had 
difficulties in seeing the connection to the RDB2RDF issue.


> ________________________________________________________________
> * I removed the table criteria Query Implementation, as it is ...<snip>
> ----------------------
> My comments: I agree that the current description of the Query 
> Implementation is
> misleading since it discusses only data retrieval/transformation. But, 
> "Query
> Implementation" in terms of distributed/federated query implementation
> as discussed in D2RQ and Jena ARQ-based SquirrelRDF need to be discussed.
> The issue of query transformation from SPARQL to SQL, as implemented by
> many systems need to be also reviewed for completeness/soundness.
> Hence, I believe we should include "Query Implementation" but focus on 
> the above listed issues.

I'm not sure in which direction this criteria goes. Does 
federated/distributed mean querying over different databases or just 
different tables. There are some issues mentioned in "D2RQ lessons 
learned" [1] about this in 3.1 and 3.2. If it is concerned with 
distributed queries over several endpoints DARQ[2] should be considered.
So there could be 2 questions here: a) Can the approach easily be used 
for integration by a federated query engine like DARQ or b) Does the 
approach allow for direct integration/ distributed queries. As for b) I 
would say most approaches are not capable of such a thing. Maybe 
Virtuoso and the mediator by Kashyap, i.e. SDS Server.


In the following are some comments on the State of the Art document.
I'm using open office and the document looks very weird on my computer, 
I will give snippets and a rough page and chapter number. A PDF would be 
good, because especially the chapter and paragraph numbers are not 
displayed correctly in open office.


*********
1. Problem of Mapping page 4
"Lastly, the ability to perform reasoning leading to knowledge discovery 
over the RDF data integrated from multiple sources is a potentially 
significant value-add.

Another important aspect that we have evaluated in this survey is the 
use of RDF for data integration from multiple heterogeneous sources. The 
representation of data in RDF also enables use of reasoning tools to 
derive additional knowledge from exiting data."
>>>>>>
Reasoning is just one of the nice features of OWL, but not the major 
advantage of RDF as such. The real advantage is the different knowledge 
representation paradigm and the ability to model additional knowledge 
and query the graph with SPARQL. Mark the famous DBpedia SPARQL query “A 
soccer player with #11 shirt in a club with a stadium of over 40,000 
seats born in a country with over 10 M inhabitants” which returns 10 
players. DBpedia is in RDF-S, no OWL or reasoning used, but still the 
possibility to query it, can reveal a great deal of "knowledge" (data 
put into a certain context).

**********
Components of Survey Framework p.5
1. Mapping approach
The ER diagram, although being close to RDF from a conceptual viewpoint, 
doesn't have expressed semantics unlike a relational database. It is 
primarily a diagram. I'm not up to date in the latest trends in db 
modelling, but rel. database change over time according to the 
application needs, e.g. performance, new values and I'm not sure, if 
there is an ER diagram for a matured database any more. Maybe somebody 
else knows if something like ER diagram semantics exists and are 
reproducable from a db. I would be interested in that.
*********
2. Mapping Representation and access:
"The mapping algorithm used for conversion of RDB to RDF may be 
represented in a XSLT stylesheet using XPath rules or in a XML based 
declarative language such as R2O. The mappings created may have wider 
applicability hence to..."
>>>>>>
substitute: mapping algorithm by paradigm or design or just: The mapping 
used for conversion of...
it should be made more clear that "Access" in the title doesn't mean how 
the data is accessed, but how 
accessible/understandable/shareable/modular the mapping definition is
( at least that is how I understood it)
**********
4. Mapping Implementation p.6
"may have performance penalty due to the on-demand conversion."
>>>>>
There might not be a performance penalty. See [4] RDF Views is faster 
than the triple store. The table from Barrasas slides [5] page 4 could 
be copy/pasted here as it gives a good overview. I'm currently not so 
sure, what a disadvantage of on-demand querying could be. Maybe that 
there is no update, write-back process, e.g. SPARUL or so.

Kind regards,
Sebastian Hellmann


[1] http://www.w3.org/2007/03/RdfRDB/papers/d2rq-positionpaper/
[2] http://www.eswc2008.org/final-pdfs-for-web-site/qpII-2.pdf
[3] http://www.insilicodiscovery.com/installation/index.php
[4] 
http://www4.wiwiss.fu-berlin.de/bizer/BerlinSPARQLBenchmark/results/index.html#comparison
[5] http://www2006.org/programme/files/pdf/p160-slides.pdf




-- 
All the best, Ashok

Received on Thursday, 20 November 2008 13:06:23 UTC