A vision about a Semantic Web Search&Reasoning Engine from Bjoern Hoehne on 2005-11-23 (semantic-web@w3.org from November 2005)

From: Bjoern Hoehne <semantic-web@lists.unreach.net>
Date: Wed, 23 Nov 2005 22:29:37 +0100
To: semantic-web@w3.org
Message-ID: <4384ED51.26529.2D22F65@localhost>
Dear Members, (apologies for the length of this posting)

as this is my first posting to any W3C Semantic Web related list in the last three 
years please find a short profile of myself below.

I am currently dreaming a Semantic Web dream based on discussions with other 
"Semantic Web friends" in the last months and I would like to share this dream with 
this list to get as much positive and negative input as possible: 
  I am dreaming of a Google-like Semantic web search engine.


=======
The Idea:
=======
A large-scale distributed web crawler should be able to crawl a large amout of already 
existing ontologies. The crawled pages could then be indexed and stored in a 
database based on a flexible scalable datastructure to store RDF, OWL, DAML+OIL 
knowledge in the form of triplets. At this stage a search form could offer some kind of 
functionality like Swoogle [0], i.e. find semantic web documents that use a set of 
properties or classes, or define classes, or that imports a given ontology.

But I would like to go some steps further and include potentials like them shown in 
the "Semantic Search Augmentation" approach [3] in Budapest in 2003 (does anyone 
know, if the work has gone any further?)


In my dream a simple query - based on the Google-like query interface - like 
"poplulation US" may be solved by a "Semantic Web search and reasoning engine" 
and producing a single anwer: 278058881

I think this is possible, given the (extracted) knowledge below, adding a good portion 
of logic and some kind of knowledge ranking.

   From [1]:
   ----------
   <rdf:Description rdf:about="#US">
      <rdf:type 
           rdf:resource="http://www.daml.org/2001/09/countries/fips-10-4-ont#Country" />
      <NS0:code>US</NS0:code>
      <NS0:name>UNITED STATES</NS0:name> 
   </rdf:Description>

   From [2]:
   ----------
   <rdf:Description rdf:about="http://www.daml.org/2001/09/countries/fips#US">
       <NS0:population rdf:ID="A110308">278058881</NS0:population> 
   </rdf:Description>

   From [4]:
   ----------
   <owl:DatatypeProperty rdf:ID="name"/>

   From [5]:
   ----------
   <owl:DatatypeProperty rdf:ID="population" /> 



The services of this global Semantic Web index should for shure be available as Web 
Service, too.


I don't want to bore you with more details of this dream at this time because I think it 
is clear about what vision I'm speaking. But I'm open to any discussions about it.



===================
Current Prototypes:
===================
I think it is possible - with some restrictions - to plan and develop such a Semantic 
Web Search&Reasoning Engine. The following prototype parts of this vision are 
currently under development:

* prototype distributed web crawler (throttled down to crawl 300,000 pages a day) is 
working
* prototype triplet-based datastructure allows efficient reasoning over 20,000,000 
(currently mainly random generated, useless triplets)
* prototype "search engine" produces the following still ugly but expandable output for 
the query obove


    Literaltriplet (=>http://www.daml.org/2001/12/factbook/us.owl)
      US (=>http://www.daml.org/2001/09/countries/fips#US)
         population (=>http://www.daml.org/2001/12/factbook/factbook-ont#population)
            278058881
		
           

==============================
Putting the things together 
or
Questions to the list members:
==============================
So before we try to figure out how to integrate the individual components into one 
large architecture and before we try to crawl the Semantic Web I want to ask this list 
one simple question: Is this possible?

After researching literature and following talks I see no similar approach, or am I 
missing something? As far as I see, the the MKSearch[7] announced at www-rdf-
iterest[6] does not offer such functionality and other Storage, Query and Reasoning 
approaches do not focus on the "normal internet user with no knowledge about 
RDQL etc" as we do.




===============
Known Problems:
===============
I currently see some problems and I guess and hope you can tell me many more...

* Test Ontologies: How to handle the large amout of test and example ontologies
* Ontology Ranking: How to rank ontologies and triplets to sort multiple outputs (if 
present)
* (probably) small amout of usefull ontologies on the Web
* I am totaly aware of the fact that such querys could never be as exact as RDQL etc. 
and that only two-word based querys are possible (more when using AND and OR to 
link query words).
* this is a very pragmatic approach




I'm hoping to get some input from this list, so don't hesitate to criticize me and my 
thoughts.

  Best Regards

      Bjoern






============
References:
============
[0] http://swoogle.umbc.edu
[1] http://www.daml.org/2001/09/countries/fips
[2] http://www.daml.org/2001/12/factbook/us.owl
[3] http://www2003.org/cdrom/papers/refereed/p779/ess.html
[4] http://www.daml.org/2001/09/countries/fips-10-4-ont
[5] http://www.daml.org/2001/12/factbook/factbook-ont
[6] http://lists.w3.org/Archives/Public/www-rdf-interest/2005Nov/0001.html
[7] http://www.mksearch.mkdoc.org/
[8] http://www.fh-reutlingen.de/englisch/index.php







==============
Short Profile:
==============
Name:      Bjoern Hoehne


Current Status:
* CIO of an innovative manufacturer of envelopes, dispatch docu-
ment holders, packaging, and other special products derived
from paper
* Semantic Web Applied Research at Reutlingen University [8]

https://www.openbc.com/hp/Bjoern_Hoehne2
Received on Friday, 25 November 2005 05:33:00 UTC