- From: Kingsley Idehen <kidehen@openlinksw.com>
- Date: Thu, 23 Jun 2011 13:32:43 +0100
- To: public-lod@w3.org
On 6/23/11 12:13 PM, Michael Brunnbauer wrote: > re > > On Thu, Jun 23, 2011 at 11:32:43AM +0100, Kingsley Idehen wrote: >>> config = { >>> 'Googlebot':['googlebot.com'], >>> 'Mediapartners-Google':['googlebot.com'], >>> 'msnbot':['live.com','msn.com','bing.com'], >>> 'bingbot':['live.com','msn.com','bing.com'], >>> 'Yahoo! Slurp':['yahoo.com','yahoo.net'] >>> } >> How does that deal with a DoS query inadvertently or deliberately >> generated by a SPARQL user agent? > It's part of the solution. It prevents countermeasures hitting the crawlers > that are welcome. > > How does WebID deal with it - except that it allows more fine grained ACLs per > person/agent instead of DNS domain ? WebID is a cool thing and maybe crawlers > will use it in the future but Martin needs solutions right now. Martin's problem isn't about right now. Yes, he used a specific example, but I can assure you it isn't about right now per se. He can blacklist offenders today, but it doesn't solve the big picture issue. You need granularity, no way around it. Logic has to be put to work, and having Logic within Data is they key to all of this. Has always been, the WWW has finally brought these matters to the fore, in a big way. >> Google and friends are the real problem to come, its the inadvertent >> SPARQL query that kicks off of a transitive crawl that's going to reek >> havoc. Google and friends aren't the problem, I meant to say. > Are you talking about one agent crawling in an unfriendly way or 10.000 agents > crawling in a friendly way but nethertheless constituting a DDOS ? I am saying: we have a new Web dimension, a data space dimension, where the WWW is now a distributed DBMS (of sorts). Thus, DBMS issues that used to be private to the enterprise are now in the public domain. A Denial of Service (DoS) can occur in a myriad of ways (deliberate or inadvertent), the most challenging as per my Cartesian product reference in an earlier post. In the information space dimension, crawling was/is an activity dominated by dedicated crawlers. In the Data Space dimension, crawling is a natural consequence of exploring (via FYN patterns) Linked Data meshes, at InterWeb scales. People will start off with a click here and there, and then they'll generate some sparql (via user friendly tools that generate sparql), and ultimately they'll have agents doing all of this and more as part of natural evolution driven by pursuit of productivity. Walking SKOS transitively or putting OWL to its ultimate use (smart traversal and integration of heterogeneous data) will make this happen. In a sense, the RDF induced uptake delays to Linked Data could actually be a blessing in disguise since the whole thing would have imploded on itself years ago based on experiences of the kind unveiled by Martin. Users don't have any time or interest in an aggressively promoted WWW innovation that fails at hurdle #1 post adoption i.e., they don't have time to wait for vendors to react and code as a response to oversights associated with integral implementation issues such as: 1. Data Access Policies 2. Infrastructure Costs. > I think agents behaving unfriendly will not be used by people other than their > authors. See my comments above. The Web agent is changing already. The Web can now be queried like a SQL RDBMS of yore, but in much more sophisticated fashion :-) > Regards, > > Michael Brunnbauer > -- Regards, Kingsley Idehen President& CEO OpenLink Software Web: http://www.openlinksw.com Weblog: http://www.openlinksw.com/blog/~kidehen Twitter/Identi.ca: kidehen
Received on Thursday, 23 June 2011 12:33:08 UTC