- From: Renato golin <renato@ebi.ac.uk>
- Date: Wed, 14 May 2008 23:30:39 +0100
- To: Peter Ansell <ansell.peter@gmail.com>
- CC: Kendall Grant Clark <kendall@clarkparsia.com>, Michael F Uschold <uschold@gmail.com>, Tim Berners-Lee <timbl@w3.org>, Sören Auer <auer@informatik.uni-leipzig.de>, Chris Bizer <chris@bizer.de>, Frank van Harmelen <frank.van.harmelen@cs.vu.nl>, Kingsley Idehen <kidehen@openlinksw.com>, Semantic Web Interest Group <semantic-web@w3.org>, "Fabian M. Suchanek" <f.m.suchanek@gmail.com>, Tim Berners-Lee <timbl@csail.mit.edu>, jim hendler <hendler@cs.rpi.edu>, Mark Greaves <markg@vulcan.com>, "georgi.kobilarov" <georgi.kobilarov@gmx.de>, Jens Lehmann <lehmann@informatik.uni-leipzig.de>, Richard Cyganiak <richard@cyganiak.de>, Frederick Giasson <fred@fgiasson.com>, Michael Bergman <mike@mkbergman.com>, Conor Shankey <cshankey@reinvent.com>, Kira Oujonkova <koujonkova@reinvent.com>, Aldo Gangemi <aldo.gangemi@istc.cnr.it>
Peter Ansell wrote: > Latency on the web prohibits "fast inferencing" in any sense of the > term, literally. (...) Don't take that the > wrong way and think that nothing can be done, just don't expect the > earth if you are only going to get a small city. Hi Peter, That's the whole point, I'm not expecting it to work fast nor big. Neither I want to re-develop the whole Quad-store infrastructure nor mirror anything (as that mess up the validity of the information). See the amount of trouble CPU developers have when using different cache technologies and multiple cores, I wouldn't go that way. What I'm saying is that we have to wake up to the fact that there is no perfect data. I work with a big amount of data (UniProt database) and it's easy to see that it's impossible to automatically annotate everything there is, even having the whole set into one single database. We spend so much time cleaning, consistency-checking, self-referencing, externally-linking that we end up having no time to actually think about the data we have and what to do with all that. For all that trouble I'm inclined to believe that we should stop trying to make everything perfectly clean and start getting smarter algorithms that slowly (organically) builds up more and more knowledge, and not just organized data. The problem is getting a grant for that... ;) For RDF, if you don't try to store and clean external data in your own schema it'll get much slower but it's impossible to store everything locally anyway. Distributed SPARQL is the way to go, but nevertheless it'll become difficult to query more than two hops away, I think. I have to read more about it but there is something calling for a statistical model that will, at the same time reduce the local quality but increase the range and in the long run, increase the global quality. cheers, --renato
Received on Wednesday, 14 May 2008 22:31:21 UTC