Re: DBpedia hosting burden from Kingsley Idehen on 2010-04-16 (public-lod@w3.org from April 2010)

From: Kingsley Idehen <kidehen@openlinksw.com>
Date: Fri, 16 Apr 2010 15:23:47 -0400
To: public-lod <public-lod@w3.org>
CC: dbpedia-discussion <dbpedia-discussion@lists.sourceforge.net>
Message-ID: <4BC8B943.20504@openlinksw.com>
Hugh Glaser wrote:
> Since I haven't seen it mentioned yet, I thought I would.
>
> I use dbpedia all the time, but never access it, so there is zero load on
> the servers.
> And for dbpedia to be at the heart of the LOD cloud, does not mean that
> there needs to even be much of a server there.
>
> OK, I do access it very occasionally when my system stumbles across (via
> sameas, etc) a new dbpedia URI.
>
> What I mean is that I do use a lot of dbpedia URIs, but that does not mean
> that I need to resolve them, or SPARQL the dbpedia server with them.
> When someone uses the name "Barack Obama" it doesn't mean they have to
> overload the White House press office by asking it for all sorts of personal
> details; in fact they might not want to know what the White House thinks
> about him - they might be using his name to ask what Al-Jazeera says about
> him.
> In the same way, when I get a dbpdia URI, that enables me to look up on some
> site I care about what the site says about the NIR.
>
> And in term of finding dbpedia URIs, if I want to find a dbpedia URI, I look
> whatever I want up in wikipedia, and then use the implied dbpedia URI.
>
> OK, I accept the problems about people who spider it, or want to do complex
> queries over it, but that is actually not my view of the LOD world.
>   

Spidering is what we constrain so as to preserve bandwidth.  Even when 
you spider via SPARQL we force you down the OFFSET and LIMIT route.  Key 
point is that these are features (self protection and preservation) as 
opposed to bugs or shortcomings  (as these issues are sometimes framed).

Complex queries, absolutely not a problem, remember, this is what the 
"Anytime Query" feature is all about, its why we can host faceted 
navigation inside the Quad Store etc.. Complex queries don't chew up 
network bandwidth.

> My view is that for many applications, I am looking at some small bit of
> stuff (say LOD researchers), and so I need to do a few URI resolutions of
> the Things that I am interested in, usually in response to some demand.
> Possibly I do this transparently using something like the SWCL.
>
> In the general scheme of things, I think that the role of dbpedia
> will/should be the provision of URIs with the ability to resolve them when
> necessary (and with a reasonable expectation that the client will have a
> decent caching policy). SPARQL is a whole different ball-game, and should be
> separated out, looking at doing caching, downloads etc..
>   
The DBpedia SPARQL endpoint is an endpoint for handling SPARQL Queries.

The Descriptor Resources that are the product of URI de-referencing are 
the typical targets of crawlers, at least first call before CONSTRUCT 
and DESCRIBE etc.. We already have solutions for these resources (which 
includes a reverse proxy setup and cache directives etc.). In addition, 
we may also 303 to other locations (URLs)  as part of URI de-referencing 
fulfillment etc..
> But the role of dbpedia is to provide URIs and occasional URI resolution to
> RDF or equivalent - anything that interferes with that should be challenged.
>   
DBpedia instance is about providing Sa PARQL endpoint and access to 
Descriptor Resources (nee. Information Resources) via Data Object URI  
de-referencing,  the instance can do both, but and enforces what it 
seeks to offer.

We will make a guide so that everyone is clear :-)


Kingsley
> Best
> Hugh
>
> PS.
> The situation always reminds me of my mobile.
> I use it all the time, but never make or receive calls.
> The existence of the mobile in my pocket changes everything about whether my
> wife and I need to speak. Because we could if we wanted to, and we know the
> other could, we don't need to make the call to say "I am on the train".
>
>
>   


-- 

Regards,

Kingsley Idehen	      
President & CEO 
OpenLink Software     
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen
Received on Friday, 16 April 2010 19:24:15 UTC