Re: DBpedia: limit of triples from Hugh Williams on 2011-08-09 (public-lod@w3.org from August 2011)

From: Hugh Williams <hwilliams@openlinksw.com>
Date: Tue, 9 Aug 2011 13:24:22 +0100
To: Jörn Hees <j_hees@cs.uni-kl.de>
Cc: public-lod@w3.org, dbpedia-discussion@lists.sourceforge.net
Message-Id: <0ED59EAD-DDCD-4A6B-9D7F-00617D778D92@openlinksw.com>

Hi 

The http://dbpedia.org/sparql endpoint has both rate limiting on the  number of connections/sec you can make, as well as restrictions on  resultset and query time, as per the following settings:

   [SPARQL]
   ResultSetMaxRows           = 2000
   MaxQueryExecutionTime      = 120
   MaxQueryCostEstimationTime = 1500

These are in place to make sure that everyone has a equal chance to de-reference data from dbpedia.org, as well as to guard against badly  written queries/robots.

The following options are at your disposal to get round these  limitations:

1. Use the LIMIT and OFFSET keywords

   You can tell a SPARQL query to return a partial result set and how  many records to skip e.g.:

	select ?s where { ?s a ?o }
	LIMIT 1000 OFFSET 2000

2. Setup a dbpedia database in your own network

   The dbpedia project provides full datasets, so you can setup your  own installation  on a sufficiently powerful box using Virtuoso Open Source Edition.

3. Setup a preconfigured installation of Virtuoso + database using  Amazon EC2 (not free)

   See: http://www.openlinksw.com/dataspace/dav/wiki/Main/VirtAWSDBpedia351C

Best Regards
Hugh Williams
Professional Services
OpenLink Software
Web: http://www.openlinksw.com
Support: http://support.openlinksw.com
Forums: http://boards.openlinksw.com/support
Twitter: http://twitter.com/OpenLink

On 9 Aug 2011, at 13:04, Jörn Hees wrote:

> On 9. Aug. 2011, at 13:15, Pablo Mendes wrote:
>>> 'yes, i also consider DBpedia buggy in this sense (hence the crossposting)'
>> Just a small note.
>> I think you mean that the SPARQL engine behind a particular deployment of DBpedia is behaving differently from what you would desire. Although there are bugs in DBpedia, this is not one of them. :) I think it is important to make this distinction between DBpedia and the SPARQL endpoints serving its contents exactly to point out that you could provide your own implementation/wrapper that sorts/limits results the way you want.
> 
> Yes, this was imprecise. I was not talking about the SPARQL endpoint (which in fact is able to return more than 2001 triples per subject). I was talking about the standard thing that many people do with a http URI: dereference it.
> 
> I agree that other / local SPARQL endpoints are useful for mass queries and to take load of the DBpedia servers, but i don't see how they help in my case, as dereferencing still goes to the server(s) at dbpedia.org.
> 
> Cheers,
> Jörn
> 
>

Received on Tuesday, 9 August 2011 12:28:12 UTC