A vision about a Semantic Web Search&Reasoning Engine

Dear Members, (apologies for the length of this posting)

as this is my first posting to any W3C Semantic Web related list in the last three 
years please find a short profile of myself below.

I am currently dreaming a Semantic Web dream based on discussions with other 
"Semantic Web friends" in the last months and I would like to share this dream with 
this list to get as much positive and negative input as possible: 
  I am dreaming of a Google-like Semantic web search engine.

The Idea:
A large-scale distributed web crawler should be able to crawl a large amout of already 
existing ontologies. The crawled pages could then be indexed and stored in a 
database based on a flexible scalable datastructure to store RDF, OWL, DAML+OIL 
knowledge in the form of triplets. At this stage a search form could offer some kind of 
functionality like Swoogle [0], i.e. find semantic web documents that use a set of 
properties or classes, or define classes, or that imports a given ontology.

But I would like to go some steps further and include potentials like them shown in 
the "Semantic Search Augmentation" approach [3] in Budapest in 2003 (does anyone 
know, if the work has gone any further?)

In my dream a simple query - based on the Google-like query interface - like 
"poplulation US" may be solved by a "Semantic Web search and reasoning engine" 
and producing a single anwer: 278058881

I think this is possible, given the (extracted) knowledge below, adding a good portion 
of logic and some kind of knowledge ranking.

   From [1]:
   <rdf:Description rdf:about="#US">
           rdf:resource="http://www.daml.org/2001/09/countries/fips-10-4-ont#Country" />
      <NS0:name>UNITED STATES</NS0:name> 

   From [2]:
   <rdf:Description rdf:about="http://www.daml.org/2001/09/countries/fips#US">
       <NS0:population rdf:ID="A110308">278058881</NS0:population> 

   From [4]:
   <owl:DatatypeProperty rdf:ID="name"/>

   From [5]:
   <owl:DatatypeProperty rdf:ID="population" /> 

The services of this global Semantic Web index should for shure be available as Web 
Service, too.

I don't want to bore you with more details of this dream at this time because I think it 
is clear about what vision I'm speaking. But I'm open to any discussions about it.

Current Prototypes:
I think it is possible - with some restrictions - to plan and develop such a Semantic 
Web Search&Reasoning Engine. The following prototype parts of this vision are 
currently under development:

* prototype distributed web crawler (throttled down to crawl 300,000 pages a day) is 
* prototype triplet-based datastructure allows efficient reasoning over 20,000,000 
(currently mainly random generated, useless triplets)
* prototype "search engine" produces the following still ugly but expandable output for 
the query obove

    Literaltriplet (=>http://www.daml.org/2001/12/factbook/us.owl)
      US (=>http://www.daml.org/2001/09/countries/fips#US)
         population (=>http://www.daml.org/2001/12/factbook/factbook-ont#population)

Putting the things together 
Questions to the list members:
So before we try to figure out how to integrate the individual components into one 
large architecture and before we try to crawl the Semantic Web I want to ask this list 
one simple question: Is this possible?

After researching literature and following talks I see no similar approach, or am I 
missing something? As far as I see, the the MKSearch[7] announced at www-rdf-
iterest[6] does not offer such functionality and other Storage, Query and Reasoning 
approaches do not focus on the "normal internet user with no knowledge about 
RDQL etc" as we do.

Known Problems:
I currently see some problems and I guess and hope you can tell me many more...

* Test Ontologies: How to handle the large amout of test and example ontologies
* Ontology Ranking: How to rank ontologies and triplets to sort multiple outputs (if 
* (probably) small amout of usefull ontologies on the Web
* I am totaly aware of the fact that such querys could never be as exact as RDQL etc. 
and that only two-word based querys are possible (more when using AND and OR to 
link query words).
* this is a very pragmatic approach

I'm hoping to get some input from this list, so don't hesitate to criticize me and my 

  Best Regards


[0] http://swoogle.umbc.edu
[1] http://www.daml.org/2001/09/countries/fips
[2] http://www.daml.org/2001/12/factbook/us.owl
[3] http://www2003.org/cdrom/papers/refereed/p779/ess.html
[4] http://www.daml.org/2001/09/countries/fips-10-4-ont
[5] http://www.daml.org/2001/12/factbook/factbook-ont
[6] http://lists.w3.org/Archives/Public/www-rdf-interest/2005Nov/0001.html
[7] http://www.mksearch.mkdoc.org/
[8] http://www.fh-reutlingen.de/englisch/index.php

Short Profile:
Name:      Bjoern Hoehne

Current Status:
* CIO of an innovative manufacturer of envelopes, dispatch docu-
ment holders, packaging, and other special products derived
from paper
* Semantic Web Applied Research at Reutlingen University [8]


Received on Friday, 25 November 2005 05:33:00 UTC