- From: ashok malhotra <ashok.malhotra@oracle.com>
- Date: Thu, 20 Nov 2008 05:05:36 -0800
- To: public-xg-rdb2rdf <public-xg-rdb2rdf@w3.org>
Forwarding to public mailing list.\ Please use this list for all technical discussion. Ashok -------- Original Message -------- Subject: Re: Fwd: StateOfTheArt Survey Date: Thu, 20 Nov 2008 11:42:07 +0100 From: Sebastian Hellmann <hellmann@informatik.uni-leipzig.de> To: Satya Sahoo <sahoo.2@wright.edu>, Wolgang Halb <Wolfgang.Halb@joanneum.at> CC: Ashok Malhotra <ashok.malhotra@oracle.com>, Sören Auer <auer@informatik.uni-leipzig.de> References: <6800d6a74142.49247e31@wright.edu> Hello all, I couldn't send this response to member-xg-rdb2rdf@w3.org, as I'm not a member. It contains a response to Satyas response and some minor comments on the state of the art document. -------- Original-Nachricht -------- Betreff: StateOfTheArt Survey Datum: Sun, 16 Nov 2008 19:08:48 -0500 Von: Satya Sahoo <sahoo.2@wright.edu> An: hellmann@informatik.uni-leipzig.de CC: Wolfgang.Halb@joanneum.at, member-xg-rdb2rdf@w3.org > My comments: I agree with the re-organization and have updated the > survey document to reflect these changes. > But, I believe the work of Chebotko is relevant since many application > do tranform SPARQL to SQL and one of the > important issues in preserving the semantics of the SPARQL query. For > example, > the SquirrelRDF extends the ARQ query engine to convert a basic graph > pattern to SQL > using the Table-to-Class approach. For now, I have classified the The > Chebotko work as > "Tools/Application" (with reference to your "classification of > literature" point later). The field "preserving SPARQL semantics to SQL" is a very wide field. Although there might be some insight gained from this, it doesn't seem worth the effort to get into detail there. Almost every Triple store has its own rewriting engine (RAP, Jena, Virtuoso). It can hardly be compared to Squirrel RDF or Relational.OWL or anything. I personally had difficulties in seeing the connection to the RDB2RDF issue. > ________________________________________________________________ > * I removed the table criteria Query Implementation, as it is ...<snip> > ---------------------- > My comments: I agree that the current description of the Query > Implementation is > misleading since it discusses only data retrieval/transformation. But, > "Query > Implementation" in terms of distributed/federated query implementation > as discussed in D2RQ and Jena ARQ-based SquirrelRDF need to be discussed. > The issue of query transformation from SPARQL to SQL, as implemented by > many systems need to be also reviewed for completeness/soundness. > Hence, I believe we should include "Query Implementation" but focus on > the above listed issues. I'm not sure in which direction this criteria goes. Does federated/distributed mean querying over different databases or just different tables. There are some issues mentioned in "D2RQ lessons learned" [1] about this in 3.1 and 3.2. If it is concerned with distributed queries over several endpoints DARQ[2] should be considered. So there could be 2 questions here: a) Can the approach easily be used for integration by a federated query engine like DARQ or b) Does the approach allow for direct integration/ distributed queries. As for b) I would say most approaches are not capable of such a thing. Maybe Virtuoso and the mediator by Kashyap, i.e. SDS Server. In the following are some comments on the State of the Art document. I'm using open office and the document looks very weird on my computer, I will give snippets and a rough page and chapter number. A PDF would be good, because especially the chapter and paragraph numbers are not displayed correctly in open office. ********* 1. Problem of Mapping page 4 "Lastly, the ability to perform reasoning leading to knowledge discovery over the RDF data integrated from multiple sources is a potentially significant value-add. Another important aspect that we have evaluated in this survey is the use of RDF for data integration from multiple heterogeneous sources. The representation of data in RDF also enables use of reasoning tools to derive additional knowledge from exiting data." >>>>>> Reasoning is just one of the nice features of OWL, but not the major advantage of RDF as such. The real advantage is the different knowledge representation paradigm and the ability to model additional knowledge and query the graph with SPARQL. Mark the famous DBpedia SPARQL query “A soccer player with #11 shirt in a club with a stadium of over 40,000 seats born in a country with over 10 M inhabitants” which returns 10 players. DBpedia is in RDF-S, no OWL or reasoning used, but still the possibility to query it, can reveal a great deal of "knowledge" (data put into a certain context). ********** Components of Survey Framework p.5 1. Mapping approach The ER diagram, although being close to RDF from a conceptual viewpoint, doesn't have expressed semantics unlike a relational database. It is primarily a diagram. I'm not up to date in the latest trends in db modelling, but rel. database change over time according to the application needs, e.g. performance, new values and I'm not sure, if there is an ER diagram for a matured database any more. Maybe somebody else knows if something like ER diagram semantics exists and are reproducable from a db. I would be interested in that. ********* 2. Mapping Representation and access: "The mapping algorithm used for conversion of RDB to RDF may be represented in a XSLT stylesheet using XPath rules or in a XML based declarative language such as R2O. The mappings created may have wider applicability hence to..." >>>>>> substitute: mapping algorithm by paradigm or design or just: The mapping used for conversion of... it should be made more clear that "Access" in the title doesn't mean how the data is accessed, but how accessible/understandable/shareable/modular the mapping definition is ( at least that is how I understood it) ********** 4. Mapping Implementation p.6 "may have performance penalty due to the on-demand conversion." >>>>> There might not be a performance penalty. See [4] RDF Views is faster than the triple store. The table from Barrasas slides [5] page 4 could be copy/pasted here as it gives a good overview. I'm currently not so sure, what a disadvantage of on-demand querying could be. Maybe that there is no update, write-back process, e.g. SPARUL or so. Kind regards, Sebastian Hellmann [1] http://www.w3.org/2007/03/RdfRDB/papers/d2rq-positionpaper/ [2] http://www.eswc2008.org/final-pdfs-for-web-site/qpII-2.pdf [3] http://www.insilicodiscovery.com/installation/index.php [4] http://www4.wiwiss.fu-berlin.de/bizer/BerlinSPARQLBenchmark/results/index.html#comparison [5] http://www2006.org/programme/files/pdf/p160-slides.pdf -- All the best, Ashok
Received on Thursday, 20 November 2008 13:06:23 UTC