- From: Geoff Chappell <geoff@sover.net>
- Date: Mon, 10 Oct 2005 08:22:24 -0400
- To: "'Henry Story'" <henry.story@bblfish.net>, "'Gareth Andrew'" <freega@freegarethandrew.org>, "'SWIG SWIG'" <semantic-web@w3.org>
> -----Original Message----- > From: semantic-web-request@w3.org [mailto:semantic-web-request@w3.org] On > Behalf Of Henry Story > Sent: Monday, October 10, 2005 7:05 AM > To: Gareth Andrew; SWIG SWIG > Subject: Re: SPARQL and Web 2 > [...] > > 2. Performance: Even if the data is completely open, and the > > economics doesn't come into play, performance is a major > > issue. > > SPARQL queries are designed to be written by Semantic Web > > engineers, much as SQL queries are designed to be written by > > database engineers. As an example, consider the following > > query > > > > PREFIX foaf: > > PREFIX dc: > > SELECT ?book > > WHERE { ?book dc:creator ?who > > ?who foaf:name "J. K. Rowling" > > } > > > > This query (if the WHERE clause is evaluated top to bottom) is > > highly inefficient, it first searches for all triples with > > property dc:creator, then filters those such that the > > dc:creator's foaf:name is "J. K. Rowling". A much more > > efficient > > query reverses the patterns in the WHERE clause. I believe > > automated query rewriting is beyond state of the art at the > > moment and will continue to be for the foreseeable future, > > That is a good point. But this is going to be true whenever you open > a query > interface to the web. I worked at AltaVista and we had to deal with > exactly > the same problem. If someone asks for "The cat of Danny Ayers" none > of the > search engines first go and find all pages in which the word "the" > appears. There > just are too many. Google even drops it. They would first look for > pages with Danny > and Ayers, and then look for which of those pages contain the word > "cat". And search engines have absolutely VAST indexes to search > through, which most companies don't > have. We had to beef up our query optimization in the latest version of RDF Gateway for just this reason (previously we'd optimized rules, but assumed that queries were written by people intimate with the data and so processed them as written). With the addition of SPARQL support, we figured we probably should stop making that assumption. Of course, this problem is harder if you don't have index stats to work with. E.g. we can do less interesting query optimization when we're querying against a remote source of data via a connector - (e.g. a relational database via our sql dataservice). [...] > > > especially when you consider the technical challenge of > > throwing > > inferencing into the mix, and the social challenge of open > > access (eg. consider the query "SELECT ?s ?p ?o WHERE > > { ?s ?p ?o}"). > > > With respect to some queries requiring too much processing on the > server side > there are already numerous well established techniques for dealing > with this > on the web. You can simply return an error message, explaining that > the server does not allow that type of query. You can cut up the results > into little chunks. This is something to look into. The best way is > to try it out. As an example, when we opened up a database on one of our servers to sparql access, we imposed a query governor (any queries that get too complex get terminated). You can even imagine that this situation might be advantageous in some cases - e.g. a company could distinguish between their free/open version (only low query complexity allowed), and their $-based version (high query complexity allowed). > > In fact I'd like to post a blog entry to point to a nice test > database with a well laid out ontology so that developers can play > with SPARQL. Any good ideas? FWIW, here's one example... we loaded the uniprot data (courtesy of Eric Jain's Uniprot RDF project) into RDF Gateway and opened up to the world. The project with sample queries is described here: http://labs.intellidimension.com/uniprot/ or to go directly to the experimental sparql query page: http://labs.intellidimension.com/uniprot/query2.rsp Of course, something like IMDB might provide a more accessible ontology (which is not a knock against the uniprot ontology, but more its subject matter :-). Best, Geoff
Received on Monday, 10 October 2005 12:23:19 UTC