Re: Managing Co-reference (Was: A Semantic Elephant?) from Peter Ansell on 2008-05-14 (semantic-web@w3.org from May 2008)

From: Peter Ansell <ansell.peter@gmail.com>
Date: Thu, 15 May 2008 09:18:53 +1000
To: renato@ebi.ac.uk
Cc: "Kendall Grant Clark" <kendall@clarkparsia.com>, "Michael F Uschold" <uschold@gmail.com>, "Tim Berners-Lee" <timbl@w3.org>, "Sören Auer" <auer@informatik.uni-leipzig.de>, "Chris Bizer" <chris@bizer.de>, "Frank van Harmelen" <frank.van.harmelen@cs.vu.nl>, "Kingsley Idehen" <kidehen@openlinksw.com>, "Semantic Web Interest Group" <semantic-web@w3.org>, "Fabian M. Suchanek" <f.m.suchanek@gmail.com>, "Tim Berners-Lee" <timbl@csail.mit.edu>, "jim hendler" <hendler@cs.rpi.edu>, "Mark Greaves" <markg@vulcan.com>, georgi.kobilarov <georgi.kobilarov@gmx.de>, "Jens Lehmann" <lehmann@informatik.uni-leipzig.de>, "Richard Cyganiak" <richard@cyganiak.de>, "Frederick Giasson" <fred@fgiasson.com>, "Michael Bergman" <mike@mkbergman.com>, "Conor Shankey" <cshankey@reinvent.com>, "Kira Oujonkova" <koujonkova@reinvent.com>, "Aldo Gangemi" <aldo.gangemi@istc.cnr.it>
Message-ID: <a1be7e0e0805141618p519184fbu10587c58994d0376@mail.gmail.com>

2008/5/15 Renato golin <renato@ebi.ac.uk>:
> Peter Ansell wrote:
>>
>> Latency on the web prohibits "fast inferencing" in any sense of the
>> term, literally. (...) Don't take that the
>> wrong way and think that nothing can be done, just don't expect the
>> earth if you are only going to get a small city.
>
> Hi Peter,
>
> That's the whole point, I'm not expecting it to work fast nor big. Neither I
> want to re-develop the whole Quad-store infrastructure nor mirror anything
> (as that mess up the validity of the information). See the amount of trouble
> CPU developers have when using different cache technologies and multiple
> cores, I wouldn't go that way.
>
> What I'm saying is that we have to wake up to the fact that there is no
> perfect data. I work with a big amount of data (UniProt database) and it's
> easy to see that it's impossible to automatically annotate everything there
> is, even having the whole set into one single database.
>
> We spend so much time cleaning, consistency-checking, self-referencing,
> externally-linking that we end up having no time to actually think about the
> data we have and what to do with all that.
>
> For all that trouble I'm inclined to believe that we should stop trying to
> make everything perfectly clean and start getting smarter algorithms that
> slowly (organically) builds up more and more knowledge, and not just
> organized data.
>
> The problem is getting a grant for that... ;)
>
> For RDF, if you don't try to store and clean external data in your own
> schema it'll get much slower but it's impossible to store everything locally
> anyway.
>
> Distributed SPARQL is the way to go, but nevertheless it'll become difficult
> to query more than two hops away, I think.
>
> I have to read more about it but there is something calling for a
> statistical model that will, at the same time reduce the local quality but
> increase the range and in the long run, increase the global quality.
>

In terms of the smarter algorithms would this be a place to put
peoples FOAF reputations on the line by having a distributed directory
(using personal foaf profiles) of <SPARQL endpoint>
foaf:trustedEndpointFor <SPARQLNamedGraphURI> . So at least you have
an initial clearance that if you trust that person you are more likely
to gain useful data from their recommended endpoints. Simple
statements like this would help no end in getting Distributed SPARQL
working effectively as well. If you agreed with them you could echo
the statement yourself... This would reduce the web of endpoints
dramatically, although it still wouldn't ensure in anyway that you
will have 100% clean data or low latency.

Peter

Received on Wednesday, 14 May 2008 23:19:39 UTC