- From: Kingsley Idehen <kidehen@openlinksw.com>
- Date: Wed, 14 Apr 2010 13:09:11 -0400
- To: Leigh Dodds <leigh.dodds@talis.com>
- CC: Ivan Mikhailov <imikhailov@openlinksw.com>, baran <baran@goldmail.de>, semanticweb <semanticweb@yahoogroups.com>, public-lod <public-lod@w3.org>, SW-forum <semantic-web@w3.org>, dbpedia-discussion <dbpedia-discussion@lists.sourceforge.net>, dbpedia-announcements <dbpedia-announcements@lists.sourceforge.net>, Chris Bizer <chris@bizer.de>
Leigh Dodds wrote: > Hi, > > 2010/4/14 Kingsley Idehen <kidehen@openlinksw.com>: > >> When we refer to an "option" we are talking about a mirror rather than >> an alternative place where DBpedia data sets have been loaded. >> > > I deliberately didn't use the word "mirror" as that sets expectations > around offering same features, using same technology, etc. So I meant > what I said: there are other SPARQL endpoints that provide live, > public access to the dbpedia data. > > Fine, but Ivan specifically commented about "Mirror". Do understand that the issues aren't about SPARQL per se. it's about what's happening around the instance at http://dbpedia.org. Crawling the Descriptor Resources is chewing up "across the wire" bandwidth. >> As for usage levels, the issues have very little to do we sane SPARQL >> query and everything to do with crawlers that actually attempt to >> perform wholesale imports of the entire data set (many attempt this as >> we can seen from the HTTP logs and the payload sizes). In addition, >> remember, we are severing up actual RDF based descriptor resources, and >> these too are crawled wholesale with the intent of populating other data >> spaces (these are also crawled aggressively via LOD and non LOD crawlers). >> >> We are not just providing a SPARQL endpoint, we are also serving RDF >> descriptor resources in a variety of representation formats. And as I've >> stated above, the dominant use pattern is crawling the RDF descriptor >> resources, which (without protection) simply obliterates "across the >> wire bandwidth" as is the case with any document server on a public >> network such as the World Wide Web. >> > > Yes I'm aware of what dbpedia is, and also the challenges of running a > live operational service :) > My comment wasn't a "what is DBpedia?" lecture. It was about clarifying the crux of the matter i.e., bandwidth consumption and its effects on other DBpedia users (as well as our own non-DBpedia related Web properties). > I was just curious about usage volumes. We all talk about how central > dbpedia is in the LOD cloud picture, and wondered if there was any > publicly accessible metrics to help add some detail that. > Well here is the critical detail: people typically crawl DBpedia. They crawl it more than any other Data Space in the LOD cloud. They do so because DBpedia is still quite central to to the burgeoning Web of Linked Data. When people aren't crawling, they are executing CONSTRUCTsor DESCRIBEs via SPARQL, which is still ultimately Export from DBpedia and Import to my data space mindset. That's as simple and precise as this matter is. From a SPARQL perspective, DBpedia is quite microscopic, its when you factor in Crawler mentality and network bandwith that issues arise, and we deliberately have protection in place for Crawlers. Kingsley > Cheers, > > L. > > -- Regards, Kingsley Idehen President & CEO OpenLink Software Web: http://www.openlinksw.com Weblog: http://www.openlinksw.com/blog/~kidehen Twitter/Identi.ca: kidehen
Received on Wednesday, 14 April 2010 17:09:49 UTC