RE: SPARQL and Web 2 from Geoff Chappell on 2005-10-10 (semantic-web@w3.org from October 2005)

From: Geoff Chappell <geoff@sover.net>
Date: Mon, 10 Oct 2005 08:22:24 -0400
To: "'Henry Story'" <henry.story@bblfish.net>, "'Gareth Andrew'" <freega@freegarethandrew.org>, "'SWIG SWIG'" <semantic-web@w3.org>
Message-ID: <07bc01c5cd95$50cef9d0$6401a8c0@gsclaptop>
> -----Original Message-----
> From: semantic-web-request@w3.org [mailto:semantic-web-request@w3.org] On
> Behalf Of Henry Story
> Sent: Monday, October 10, 2005 7:05 AM
> To: Gareth Andrew; SWIG SWIG
> Subject: Re: SPARQL and Web 2
> 
[...] 
> >      2. Performance: Even if the data is completely open, and the
> >         economics doesn't come into play, performance is a major
> > issue.
> >         SPARQL queries are designed to be written by Semantic Web
> >         engineers, much as SQL queries are designed to be written by
> >         database engineers. As an example, consider the following
> > query
> >
> >         PREFIX foaf:
> >         PREFIX dc:
> >         SELECT ?book
> >         WHERE { ?book dc:creator ?who
> >              ?who foaf:name "J. K. Rowling"
> >         }
> >
> >         This query (if the WHERE clause is evaluated top to bottom) is
> >         highly inefficient, it first searches for all triples with
> >         property dc:creator, then filters those such that the
> >         dc:creator's foaf:name is "J. K. Rowling". A much more
> > efficient
> >         query reverses the patterns in the WHERE clause. I believe
> >         automated query rewriting is beyond state of the art at the
> >         moment and will continue to be for the foreseeable future,
> 
> That is a good point. But this is going to be true whenever you open
> a query
> interface to the web. I worked at AltaVista and we had to deal with
> exactly
> the same problem. If someone asks for "The cat of Danny Ayers" none
> of the
> search engines first go and find all pages in which the word "the"
> appears. There
> just are too many. Google even drops it. They would first look for
> pages with Danny
> and Ayers, and then look for which of those pages contain the word
> "cat". And search engines have absolutely VAST indexes to search
> through, which most companies don't
> have.

We had to beef up our query optimization in the latest version of RDF
Gateway for just this reason (previously we'd optimized rules, but assumed
that queries were written by people intimate with the data and so processed
them as written). With the addition of SPARQL support, we figured we
probably should stop making that assumption. 

Of course, this problem is harder if you don't have index stats to work
with. E.g. we can do less interesting query optimization when we're querying
against a remote source of data via a connector - (e.g. a relational
database via our sql dataservice).

[...]
> 
> >         especially when you consider the technical challenge of
> > throwing
> >         inferencing into the mix, and the social challenge of open
> >         access (eg. consider the query "SELECT ?s ?p ?o WHERE
> >         { ?s ?p ?o}").
> 
> 
> With respect to some queries requiring too much processing on the
> server side
> there are already numerous well established techniques for dealing
> with this
> on the web. You can simply return an error message, explaining that
> the server does not allow that type of query. You can cut up the results
> into little chunks. This is something to look into. The best way is
> to try it out.

As an example, when we opened up a database on one of our servers to sparql
access, we imposed a query governor (any queries that get too complex get
terminated). You can even imagine that this situation might be advantageous
in some cases - e.g. a company could distinguish between their free/open
version (only low query complexity allowed), and their $-based version (high
query complexity allowed).
 
> 
> In fact I'd like to post a blog entry to point to  a nice test
> database with a well laid out ontology so that developers can play
> with SPARQL. Any good ideas?

FWIW, here's one example... we loaded the uniprot data (courtesy of Eric
Jain's Uniprot RDF project) into RDF Gateway and opened up to the world. The
project with sample queries is described here:

	http://labs.intellidimension.com/uniprot/

or to go directly to the experimental sparql query page:

	http://labs.intellidimension.com/uniprot/query2.rsp

Of course, something like IMDB might provide a more accessible ontology
(which is not a knock against the uniprot ontology, but more its subject
matter :-). 

Best,

Geoff
Received on Monday, 10 October 2005 12:23:19 UTC