Re: Can we afford to offer SPARQL endpoints when we are successful? (Was "linked data hosted somewhere") from Peter Ansell on 2008-11-27 (public-lod@w3.org from November 2008)

From: Peter Ansell <ansell.peter@gmail.com>
Date: Thu, 27 Nov 2008 10:47:41 +1000
To: "Hugh Glaser" <hg@ecs.soton.ac.uk>, "public-lod@w3.org" <public-lod@w3.org>, "Bio2Rdf Mailing List" <bio2rdf@googlegroups.com>
Message-ID: <a1be7e0e0811261647o55f1f2afie13731cdb8affd01@mail.gmail.com>

2008/11/27 Hugh Glaser <hg@ecs.soton.ac.uk>

>
> Prompted by the thread on "linked data hosted somewhere" I would like to
> ask
> the above question that has been bothering me for a while.
>
> The only reason anyone can afford to offer a SPARQL endpoint is because it
> doesn't get used too much?
>
> As abstract components for studying interaction, performance, etc.:
> DB=KB, SQL=SPARQL.
> In fact, I often consider the components themselves interchangeable; that
> is, the first step of the migration to SW technologies for an application
> is
> to take an SQL-based back end and simply replace it with a SPARQL/RDF back
> end and then carry on.
>
> However.
> No serious DB publisher gives direct SQL access to their DB (I think).
> There are often commercial reasons, of course.
> But even when there are not (the Open in LOD), there are only search
> options
> and possibly download facilities.
> Even government organisations that have a remit to publish their data don't
> offer SQL access.
>
> Will we not have to do the same?
> Or perhaps there is a subset of SPARQL that I could offer that will allow
> me
> to offer a "safer" service that conforms to other's safer service (so it is
> well-understood?
> Is this defined, or is anyone working on it?
>
> And I am not referring to any particular software - it seems to me that
> this
> is something that LODers need to worry about.
> We aim to take over the world; and if SPARQL endpoints are part of that
> (maybe they aren't - just resolvable URIs?), then we should make damn sure
> that we think they can be delivered.
>
> My answer to my subject question?
> No, not as it stands. And we need to have a story to replace it.
>
> Best
> Hugh
>
>
I don't think we can afford to offer the actual public grade infrastructure
for free unless there is corporate backing for particular endpoints.
However, we can still tentatively roll out SPARQL endpoints and resolvers in
mirror configurations together with software which can round robin across
the endpoints to get information without overloading a particular endpoint
to at least get some redundancy and figure out what needs to be done to fine
tune the methods for distributed queries. Once you have the ability to round
robin across sparql endpoints and still choose them intelligently based on a
knowledge of what is inside each one you can distribute the source RDF to
anyone and have them give back the information about how to access the
endpoint, and if people are found to be overloading an endpoint send them a
polite message to either round robin across the available endpoints or get
their own local SPARQL installation which can be configured to respond to
work the same as the public endpoint.

An example implementation of this functionality is the distribution of
queries across endpoints for Bio2RDF [1] which together with the
distribution of a combination of Virtuoso DB files [2] and source NTriples
files [3] make it relatively simple for people to download the software [4],
and the resolver package and redirect the configuration file to their own
local versions for large scale private use of semantics using exactly the
same URI's that resolve using a combination of the publically available
resolvers which may or may not be contacting public SPARQL endpoints. An
example of a public resolver contacting a combination of public and private
SPARQL endpoints is [5]. (Please don't go and overload it though because as
Hugh says, the threat of overloading is quite real for any particular
endpoint :) ).

I do agree that arbitrary SPARQL queries should be localised to private
installations, but before you do that you have to provide easy ways for
people to get private installations which resolve URI's in the same way that
they are in the public web.

Cheers,

Peter

[1] http://bio2rdf.mquter.qut.edu.au/admin/configuration/rdfxml
[2] http://quebec.bio2rdf.org/download/virtuoso/indexed/
[3] http://quebec.bio2rdf.org/download/n3/
[4] http://sourceforge.net/project/platformdownload.php?group_id=142631
[5] http://bio2rdf.mquter.qut.edu.au/

Received on Thursday, 27 November 2008 00:48:24 UTC