- From: adasal <adam.saltiel@gmail.com>
- Date: Fri, 17 Dec 2010 11:21:01 +0000
- To: Martin Hepp <martin.hepp@ebusiness-unibw.org>
- Cc: public-lod@w3.org, semantic-web@w3.org
- Message-ID: <AANLkTim4Y43cvyVKMQauka_=9JFCN6nx-WCD+mFfqezV@mail.gmail.com>
Martin, Thank you for the question as it and answers are very informative for me. I am in the position of examining RDF alternatives or additional solutions to the solution we are building. However the architecture of this solution may be of interest with respect to some of your requirements. 2. Scalability - the SPARQL endpoint must handle tens of thousands of request per hour 3. Resource management for the endpoint - it must be possible to protect the SPARQL endpoint from costly queries and return just a subset or refuse a query 4. Resource management for the underlying RDBMS or Web services - it must be possible to protect the original RDBMS and involved Web services from excessive traffic (both willful ("Semantic DDoS") and unintentional (PhD students' Python scrips gone wild). 2. Scalability - this is being tackled using a high availability pattern. There will be a master read/write and several slave read only instances. Each instance is otherwise a clone deployed in its own VM. The solution is modularised and communication between modules is internal HTTP and between instances is mediated behind a Varnish security layer. 3. Resource management for the endpoint - Varnish also handles Edge Side Includes, acting as a cache. This means that a ttl is placed in the header of all served content and that the design relies on new queries being able to assemble all or part of their content from the cache. Queries are JSON and atomic elements have IDs which facilitates this. 4. Resource management for the underlying RDBMS or Web services - as 3. plus 1. Careful design of the JSON queries. I am not sure how this might translate to SPARQL but the JSOn queries allow a meta query which returns what is available to be queried in a domain. In a domain you cannot just query for everything, or if you do (something ending in /*) a sensible subset is returned depending on the path to /*. 2. Web services - this is different and would depend on the service and what you need from it. We have the problem of dynamic data and have not decided yet what combination of large queries to small queries, storing to the database and allowing the results to exist in the Varnish layer cache will be optimal. There is also the possibility of using the hibernate cache here. We have had several discussions and come up with tentative solutions, but I think that in the end we will develop a solution in the context of an architecture and development process that is flexible enough to allow for us to introduce other approaches as we understand the implication of the solution better. Here I am talking about performance and availability to the client plus good citizenship w.r.t. the web services. I realise none of this is SPARQL world specific. I can't help with that, and don't know how these ideas might transfer over. Meanwhile I remain interested in this area and in automatic RDF creation (or understanding RDF/OWL far better so that I can hand craft!) because our solution has some inflexibility and requires a good deal of developer effort. HTH. Best, Adam On 15 December 2010 01:21, Martin Hepp <martin.hepp@ebusiness-unibw.org>wrote: > Dear all: > Are there really no experiences beyond academic research regarding this > task? I had assumed it was a pretty standard requirement... > > Best > > Martin > > On 11.12.2010, at 09:33, Martin Hepp wrote: > > Dear all: >> >> There are many different ways of exposing existing relational databases as >> SPARQL, e.g. as summarized by [1], namely Virtuoso's RDF Views, D2RQ, and >> Triplify. >> >> I am looking for best practices / recommendations for the following >> scenario: >> >> 1. There is a large and highly dynamic product or services database; part >> of the data (e.g. prices) may even come from external Web services (think of >> airfare, hotel prices). >> 2. I want to make this accessible as a SPARQL endpoint using GoodRelations >> and FOAF. >> 3. The mapping from the original data structures to the proper RDF must be >> hand-crafted anyway, so automation of this process is not important >> 4. Creating RDF dumps is not feasible due to >> >> - the dynamics of the data >> - the combinatorial complexity (not all combinations may be materialized >> in the database; think of product variants). >> >> Key requirements for me are: >> >> 1. Maturity of the software (alpha / beta releases are no option) >> 2. Scalability - the SPARQL endpoint must handle tens of thousands of >> request per hour >> 3. Resource management for the endpoint - it must be possible to protect >> the SPARQL endpoint from costly queries and return just a subset or refuse a >> query >> 4. Resource management for the underlying RDBMS or Web services - it must >> be possible to protect the original RDBMS and involved Web services from >> excessive traffic (both willful ("Semantic DDoS") and unintentional (PhD >> students' Python scrips gone wild). >> >> What would you recommend? My main point is really: Which tools / >> architecture would you recommend if failure is not an option? >> >> Thanks for any opinions! >> >> >> Best >> >> Martin >> >> [1] A Survey of Current Approaches for Mapping of Relational Databases to >> RDF (PDF), Satya S. Sahoo, Wolfgang Halb, Sebastian Hellmann, Kingsley >> Idehen, Ted Thibodeau Jr, Sören Auer, Juan Sequeda, Ahmed Ezzat, 2009-01-31. >> http://www.w3.org/2005/Incubator/rdb2rdf/RDB2RDF_SurveyReport.pdf >> >> -------------------------------------------------------- >> martin hepp >> e-business & web science research group >> universitaet der bundeswehr muenchen >> >> e-mail: hepp@ebusiness-unibw.org >> phone: +49-(0)89-6004-4217 >> fax: +49-(0)89-6004-4620 >> www: http://www.unibw.de/ebusiness/ (group) >> http://www.heppnetz.de/ (personal) >> skype: mfhepp >> twitter: mfhepp >> >> Check out GoodRelations for E-Commerce on the Web of Linked Data! >> ================================================================= >> * Project Main Page: http://purl.org/goodrelations/ >> * Quickstart Guide for Developers: http://bit.ly/quickstart4gr >> * Vocabulary Reference: http://purl.org/goodrelations/v1 >> * Developer's Wiki: http://www.ebusiness-unibw.org/wiki/GoodRelations >> * Examples: http://bit.ly/cookbook4gr >> * Presentations: http://bit.ly/grtalks >> * Videos: http://bit.ly/grvideos >> >> >> > >
Received on Friday, 17 December 2010 11:21:35 UTC