Re: Managing Co-reference (Was: A Semantic Elephant?) from Renato golin on 2008-05-14 (semantic-web@w3.org from May 2008)

From: Renato golin <renato@ebi.ac.uk>
Date: Wed, 14 May 2008 23:30:39 +0100
To: Peter Ansell <ansell.peter@gmail.com>
CC: Kendall Grant Clark <kendall@clarkparsia.com>, Michael F Uschold <uschold@gmail.com>, Tim Berners-Lee <timbl@w3.org>, Sören Auer <auer@informatik.uni-leipzig.de>, Chris Bizer <chris@bizer.de>, Frank van Harmelen <frank.van.harmelen@cs.vu.nl>, Kingsley Idehen <kidehen@openlinksw.com>, Semantic Web Interest Group <semantic-web@w3.org>, "Fabian M. Suchanek" <f.m.suchanek@gmail.com>, Tim Berners-Lee <timbl@csail.mit.edu>, jim hendler <hendler@cs.rpi.edu>, Mark Greaves <markg@vulcan.com>, "georgi.kobilarov" <georgi.kobilarov@gmx.de>, Jens Lehmann <lehmann@informatik.uni-leipzig.de>, Richard Cyganiak <richard@cyganiak.de>, Frederick Giasson <fred@fgiasson.com>, Michael Bergman <mike@mkbergman.com>, Conor Shankey <cshankey@reinvent.com>, Kira Oujonkova <koujonkova@reinvent.com>, Aldo Gangemi <aldo.gangemi@istc.cnr.it>
Message-ID: <482B680F.2030401@ebi.ac.uk>

Peter Ansell wrote:
> Latency on the web prohibits "fast inferencing" in any sense of the
> term, literally. (...) Don't take that the
> wrong way and think that nothing can be done, just don't expect the
> earth if you are only going to get a small city.

Hi Peter,

That's the whole point, I'm not expecting it to work fast nor big. 
Neither I want to re-develop the whole Quad-store infrastructure nor 
mirror anything (as that mess up the validity of the information). See 
the amount of trouble CPU developers have when using different cache 
technologies and multiple cores, I wouldn't go that way.

What I'm saying is that we have to wake up to the fact that there is no 
perfect data. I work with a big amount of data (UniProt database) and 
it's easy to see that it's impossible to automatically annotate 
everything there is, even having the whole set into one single database.

We spend so much time cleaning, consistency-checking, self-referencing, 
externally-linking that we end up having no time to actually think about 
the data we have and what to do with all that.

For all that trouble I'm inclined to believe that we should stop trying 
to make everything perfectly clean and start getting smarter algorithms 
that slowly (organically) builds up more and more knowledge, and not 
just organized data.

The problem is getting a grant for that... ;)

For RDF, if you don't try to store and clean external data in your own 
schema it'll get much slower but it's impossible to store everything 
locally anyway.

Distributed SPARQL is the way to go, but nevertheless it'll become 
difficult to query more than two hops away, I think.

I have to read more about it but there is something calling for a 
statistical model that will, at the same time reduce the local quality 
but increase the range and in the long run, increase the global quality.

cheers,
--renato

Received on Wednesday, 14 May 2008 22:31:21 UTC