- From: Bjoern Hoehne <semantic-web@lists.unreach.net>
- Date: Wed, 23 Nov 2005 22:29:37 +0100
- To: semantic-web@w3.org
- Message-ID: <4384ED51.26529.2D22F65@localhost>
Dear Members, (apologies for the length of this posting)
as this is my first posting to any W3C Semantic Web related list in the last three
years please find a short profile of myself below.
I am currently dreaming a Semantic Web dream based on discussions with other
"Semantic Web friends" in the last months and I would like to share this dream with
this list to get as much positive and negative input as possible:
I am dreaming of a Google-like Semantic web search engine.
=======
The Idea:
=======
A large-scale distributed web crawler should be able to crawl a large amout of already
existing ontologies. The crawled pages could then be indexed and stored in a
database based on a flexible scalable datastructure to store RDF, OWL, DAML+OIL
knowledge in the form of triplets. At this stage a search form could offer some kind of
functionality like Swoogle [0], i.e. find semantic web documents that use a set of
properties or classes, or define classes, or that imports a given ontology.
But I would like to go some steps further and include potentials like them shown in
the "Semantic Search Augmentation" approach [3] in Budapest in 2003 (does anyone
know, if the work has gone any further?)
In my dream a simple query - based on the Google-like query interface - like
"poplulation US" may be solved by a "Semantic Web search and reasoning engine"
and producing a single anwer: 278058881
I think this is possible, given the (extracted) knowledge below, adding a good portion
of logic and some kind of knowledge ranking.
From [1]:
----------
<rdf:Description rdf:about="#US">
<rdf:type
rdf:resource="http://www.daml.org/2001/09/countries/fips-10-4-ont#Country" />
<NS0:code>US</NS0:code>
<NS0:name>UNITED STATES</NS0:name>
</rdf:Description>
From [2]:
----------
<rdf:Description rdf:about="http://www.daml.org/2001/09/countries/fips#US">
<NS0:population rdf:ID="A110308">278058881</NS0:population>
</rdf:Description>
From [4]:
----------
<owl:DatatypeProperty rdf:ID="name"/>
From [5]:
----------
<owl:DatatypeProperty rdf:ID="population" />
The services of this global Semantic Web index should for shure be available as Web
Service, too.
I don't want to bore you with more details of this dream at this time because I think it
is clear about what vision I'm speaking. But I'm open to any discussions about it.
===================
Current Prototypes:
===================
I think it is possible - with some restrictions - to plan and develop such a Semantic
Web Search&Reasoning Engine. The following prototype parts of this vision are
currently under development:
* prototype distributed web crawler (throttled down to crawl 300,000 pages a day) is
working
* prototype triplet-based datastructure allows efficient reasoning over 20,000,000
(currently mainly random generated, useless triplets)
* prototype "search engine" produces the following still ugly but expandable output for
the query obove
Literaltriplet (=>http://www.daml.org/2001/12/factbook/us.owl)
US (=>http://www.daml.org/2001/09/countries/fips#US)
population (=>http://www.daml.org/2001/12/factbook/factbook-ont#population)
278058881
==============================
Putting the things together
or
Questions to the list members:
==============================
So before we try to figure out how to integrate the individual components into one
large architecture and before we try to crawl the Semantic Web I want to ask this list
one simple question: Is this possible?
After researching literature and following talks I see no similar approach, or am I
missing something? As far as I see, the the MKSearch[7] announced at www-rdf-
iterest[6] does not offer such functionality and other Storage, Query and Reasoning
approaches do not focus on the "normal internet user with no knowledge about
RDQL etc" as we do.
===============
Known Problems:
===============
I currently see some problems and I guess and hope you can tell me many more...
* Test Ontologies: How to handle the large amout of test and example ontologies
* Ontology Ranking: How to rank ontologies and triplets to sort multiple outputs (if
present)
* (probably) small amout of usefull ontologies on the Web
* I am totaly aware of the fact that such querys could never be as exact as RDQL etc.
and that only two-word based querys are possible (more when using AND and OR to
link query words).
* this is a very pragmatic approach
I'm hoping to get some input from this list, so don't hesitate to criticize me and my
thoughts.
Best Regards
Bjoern
============
References:
============
[0] http://swoogle.umbc.edu
[1] http://www.daml.org/2001/09/countries/fips
[2] http://www.daml.org/2001/12/factbook/us.owl
[3] http://www2003.org/cdrom/papers/refereed/p779/ess.html
[4] http://www.daml.org/2001/09/countries/fips-10-4-ont
[5] http://www.daml.org/2001/12/factbook/factbook-ont
[6] http://lists.w3.org/Archives/Public/www-rdf-interest/2005Nov/0001.html
[7] http://www.mksearch.mkdoc.org/
[8] http://www.fh-reutlingen.de/englisch/index.php
==============
Short Profile:
==============
Name: Bjoern Hoehne
Current Status:
* CIO of an innovative manufacturer of envelopes, dispatch docu-
ment holders, packaging, and other special products derived
from paper
* Semantic Web Applied Research at Reutlingen University [8]
https://www.openbc.com/hp/Bjoern_Hoehne2
Received on Friday, 25 November 2005 05:33:00 UTC