- From: Bjoern Hoehne <semantic-web@lists.unreach.net>
- Date: Wed, 23 Nov 2005 22:29:37 +0100
- To: semantic-web@w3.org
- Message-ID: <4384ED51.26529.2D22F65@localhost>
Dear Members, (apologies for the length of this posting) as this is my first posting to any W3C Semantic Web related list in the last three years please find a short profile of myself below. I am currently dreaming a Semantic Web dream based on discussions with other "Semantic Web friends" in the last months and I would like to share this dream with this list to get as much positive and negative input as possible: I am dreaming of a Google-like Semantic web search engine. ======= The Idea: ======= A large-scale distributed web crawler should be able to crawl a large amout of already existing ontologies. The crawled pages could then be indexed and stored in a database based on a flexible scalable datastructure to store RDF, OWL, DAML+OIL knowledge in the form of triplets. At this stage a search form could offer some kind of functionality like Swoogle [0], i.e. find semantic web documents that use a set of properties or classes, or define classes, or that imports a given ontology. But I would like to go some steps further and include potentials like them shown in the "Semantic Search Augmentation" approach [3] in Budapest in 2003 (does anyone know, if the work has gone any further?) In my dream a simple query - based on the Google-like query interface - like "poplulation US" may be solved by a "Semantic Web search and reasoning engine" and producing a single anwer: 278058881 I think this is possible, given the (extracted) knowledge below, adding a good portion of logic and some kind of knowledge ranking. From [1]: ---------- <rdf:Description rdf:about="#US"> <rdf:type rdf:resource="http://www.daml.org/2001/09/countries/fips-10-4-ont#Country" /> <NS0:code>US</NS0:code> <NS0:name>UNITED STATES</NS0:name> </rdf:Description> From [2]: ---------- <rdf:Description rdf:about="http://www.daml.org/2001/09/countries/fips#US"> <NS0:population rdf:ID="A110308">278058881</NS0:population> </rdf:Description> From [4]: ---------- <owl:DatatypeProperty rdf:ID="name"/> From [5]: ---------- <owl:DatatypeProperty rdf:ID="population" /> The services of this global Semantic Web index should for shure be available as Web Service, too. I don't want to bore you with more details of this dream at this time because I think it is clear about what vision I'm speaking. But I'm open to any discussions about it. =================== Current Prototypes: =================== I think it is possible - with some restrictions - to plan and develop such a Semantic Web Search&Reasoning Engine. The following prototype parts of this vision are currently under development: * prototype distributed web crawler (throttled down to crawl 300,000 pages a day) is working * prototype triplet-based datastructure allows efficient reasoning over 20,000,000 (currently mainly random generated, useless triplets) * prototype "search engine" produces the following still ugly but expandable output for the query obove Literaltriplet (=>http://www.daml.org/2001/12/factbook/us.owl) US (=>http://www.daml.org/2001/09/countries/fips#US) population (=>http://www.daml.org/2001/12/factbook/factbook-ont#population) 278058881 ============================== Putting the things together or Questions to the list members: ============================== So before we try to figure out how to integrate the individual components into one large architecture and before we try to crawl the Semantic Web I want to ask this list one simple question: Is this possible? After researching literature and following talks I see no similar approach, or am I missing something? As far as I see, the the MKSearch[7] announced at www-rdf- iterest[6] does not offer such functionality and other Storage, Query and Reasoning approaches do not focus on the "normal internet user with no knowledge about RDQL etc" as we do. =============== Known Problems: =============== I currently see some problems and I guess and hope you can tell me many more... * Test Ontologies: How to handle the large amout of test and example ontologies * Ontology Ranking: How to rank ontologies and triplets to sort multiple outputs (if present) * (probably) small amout of usefull ontologies on the Web * I am totaly aware of the fact that such querys could never be as exact as RDQL etc. and that only two-word based querys are possible (more when using AND and OR to link query words). * this is a very pragmatic approach I'm hoping to get some input from this list, so don't hesitate to criticize me and my thoughts. Best Regards Bjoern ============ References: ============ [0] http://swoogle.umbc.edu [1] http://www.daml.org/2001/09/countries/fips [2] http://www.daml.org/2001/12/factbook/us.owl [3] http://www2003.org/cdrom/papers/refereed/p779/ess.html [4] http://www.daml.org/2001/09/countries/fips-10-4-ont [5] http://www.daml.org/2001/12/factbook/factbook-ont [6] http://lists.w3.org/Archives/Public/www-rdf-interest/2005Nov/0001.html [7] http://www.mksearch.mkdoc.org/ [8] http://www.fh-reutlingen.de/englisch/index.php ============== Short Profile: ============== Name: Bjoern Hoehne Current Status: * CIO of an innovative manufacturer of envelopes, dispatch docu- ment holders, packaging, and other special products derived from paper * Semantic Web Applied Research at Reutlingen University [8] https://www.openbc.com/hp/Bjoern_Hoehne2
Received on Friday, 25 November 2005 05:33:00 UTC