Re: Public SPARQL endpoints:managing (mis)-use and communicating limits to users.

Hi Everyone,

On Fri, Apr 19, 2013 at 1:06 AM, Andrea Splendiani <
andrea.splendiani@iscb.org> wrote:

> Il giorno 18/apr/2013, alle ore 16:04, Kingsley Idehen <
> kidehen@openlinksw.com> ha scritto:
>
> > On 4/18/13 9:23 AM, Andrea Splendiani wrote:
> >> Hi,
> >>
> >> I think that some caching with a minimum of query rewriting would get
> read of 90% of the select{?s ?p ?o} where {?s?p ?o} queries.
> > Sorta.
> > Client queries are inherently unpredictable. That's always been the
> case, and that predates SPARQL. These issues also exist in the SQL RDBMS
> realm, which is why you don't have SQL endpoints delivering what SPARQL
> endpoints provide.
> I know, but I suspect that these days lot of these "intensive" queries are
> explorative, just to check what is in the dataset, and may end up being
> very similar in structure.
> Jerven: can you report on your experience in this ? How much of
> problematic queries are not really targeted, but more generic ?

The most problematic queries are the dataset statistic ones. But only
because the lack of memory use limits in sesame2.6 (thankfully fixed in
2.7) Most biological interesting queries are much faster as they do not
affect so much data and are very index friendly. e.g often about 10 joins
of with many roots. Although some queries I write end up being an all
against all string match exercise. Fortunately these time out without
damaging end point stability. Often the aggregate queries are the slowest
as they always need to go over all answers. And as they are a new feature
for some of the SPARQL endpoints they are the least optimized.

For the UniProt web usecase using a SPARQL endpoint instead of a RDBMS
makes sense. Although at the moment full text indexes plus a key-value
store works very well to.

Regards,
Jerven

Received on Tuesday, 30 April 2013 13:13:30 UTC