Re: A vision about a Semantic Web Search&Reasoning Engine from Joshua Tauberer on 2005-11-27 (semantic-web@w3.org from November 2005)

From: Joshua Tauberer <tauberer@for.net>
Date: Sun, 27 Nov 2005 17:02:53 -0500
To: Bjoern Hoehne <semantic-web@lists.unreach.net>
CC: semantic-web@w3.org
Message-ID: <438A2D0D.4020902@for.net>

Bjoern Hoehne wrote:
>   I am dreaming of a Google-like Semantic web search engine.

I think this is pretty much the end-game of the semantic web, right?  If
we can have a natural language(-ish) query yielding answer from the sw,
what can't be done?

> * prototype triplet-based datastructure allows efficient reasoning over 
> 20,000,000 (currently mainly random generated, useless triplets)

This might be grossly underestimating what would be really needed, at
least for a general-purpose search engine that's interesting for a wide
audience.  On my own I've created around 10 million real-world triples
describing U.S. federal legislation and related information
(http://www.govtrack.us/source.xpd), and hopefully there will be many
databases of at least that size.

Something that I think might be useful is not trying to have the same 
system that indexes the sem web also be responsible for answering 
questions, but rather just pointing to the data sources that can help 
answer a question.  The benefit of this is that the centralized index 
can work with compact representations of what data sources can answer, 
rather than with all of their data.

Let's say there's a big data source of FOAF files.  That data source
knows what information it has, so it can publish something that
compactly represents the queries it could answer:
    ?a foaf:name ?b .
    ?a foaf:mbox ?c .
    ?a foaf:knows ?d .
    ?a ex:uriMatchesGlob "http://www.livejournal.com/users/*/data/foaf" .
And it also publishes something about what protocol it supports to
answer queries (an HTTP GET, a SPARQL web service, etc.).

The indexing service can then figure out what data sources can help 
answer a question without needing to do inferencing on billions of 
triples, and can refer a client to the right places.

What's needed for this is a simple vocabulary for describing data 
sources: how they can be queried and what information they contain.

-- 
- Joshua Tauberer

http://taubz.for.net

** Nothing Unreal Exists **

Received on Sunday, 27 November 2005 22:03:09 UTC