RE: web to semantic web : an automated approach from John Flynn on 2008-10-20 (semantic-web@w3.org from October 2008)

From: John Flynn <jflynn@bbn.com>
Date: Mon, 20 Oct 2008 11:00:45 -0400
To: "'ravinder thakur'" <ravinderthakur@gmail.com>, <semantic-web@w3.org>, <semantic_web@googlegroups.com>
Message-ID: <004c01c932c4$a1b60810$e5221830$@com>
The popular opinion in the community seems to be that the data for the
Semantic Web will mostly come from large structured data sources. However,
currently a large amount of the information on the Web is contained in
unstructured form. One of the key reasons that large unstructured sources of
data remains unavailable to the Semantic Web is that very little effort has
been made to make it easy and compelling for traditional html web site
developers to mark up their data in a way that it can simply be accessed via
the Semantic Web. Both RDFa and HTML2 are addressing this issue, but there
is still no simple way to html tag specific local web site data as instances
of a widely used ontology located at a remote site. You  might envision a
generally accepted ontology on a domain such as  "wine" that many of the
individual html web sites on that subject would link their data to as
instances. A capability to search that ontology could lead back to the
marked up instance data, which might, in turn, give a compelling reason for
the web site developers to go to the effort of making the changes to their
web site. But, this could only happen if a very simple way is provided for
them to mark up their data as instances of a remote ontology while also
allowing the data to show up in traditional web browser. 

John

-----Original Message-----
From: semantic-web-request@w3.org [mailto:semantic-web-request@w3.org] On
Behalf Of ravinder thakur
Sent: Sunday, October 19, 2008 3:08 PM
To: semantic-web@w3.org; semantic_web@googlegroups.com
Subject: web to semantic web : an automated approach


Hello friends,

I have been following semantic web for some time now and have seen quite 
a lot of projects being run (dbpedia, FOAF etc) trying to generate some 
semantic content. While these approaches might have been successful in 
their goals, one major problem plaguing semantic web as a whole is the 
lack of semantic content. Unfortunately there is nothing in sight that 
we can rely on to generate semantic content for the truckloads of 
information being put on web everyday. I think one of the _wrong_ 
assumption in semantic web community is that content creators will be 
creating a semantic data which I think is too much for the asking from 
even more technically sound part of web community let along whole of the 
web community. It hasn't happened over last so many years and I don't 
see it happening in the near future.

I think what we need to move the semantic web forward is a mechanism to 
_automatcially_ convert the information over the web to semantic 
information. There are many softwares/services that can be used for this 
purpose. I am currently developing one prototype for this purpose. This 
prototype uses services from OpenCalais(http://www.opencalais.com/) to 
convert ordinary text to semantic form. This service is very limited in 
what entities supports at the moment but its a very good start. I am 
pretty sure there will be many other good options available that might 
be unknown to me. The currently very primitive prototype can be seen at 
http://arcse.appspot.com. This currently implements very few of the 
ideas I have for this. This is hosted on Google's AppEngine so sometime 
gives timeout messages internally so please bear with this :).

This automatic conversion however is not a simple task and needs work in 
lot in domains ranging form NLP to artificial intelligence to semantic 
web to logic etc. So thats why this mail. I will be more than happy if 
we can join together to form a like minded team that can work on solving 
this most important problem plaguing semantic web currently.

Waiting for your suggestions/criticisms
Ravinder Thakur
Received on Monday, 20 October 2008 15:01:18 UTC