Re: web to semantic web : an automated approach

>>>>This is indeed an essential point in the development of the Semantic
Web. I'm
>>>>mostly in the "it'll happen" camp with regards to people creating
semantic
>>>> content. There are two main sources, one is that they say that 70% of
the
>>>> data on the web is allready in some structured form, thus what's needed
is to
>>>> clarify what that structure means.

I have been in "it will happen" camp but nothing far reaching seems to be
happening so i am out. I would say that most of the data (90%) of data out
there is unstructured. Also most of the strucutred data is specific to
companies and they wont share it. There are people writing blogs, wikipedia,
news websites producing content continuisley, people reviewing the products,
putting their opinions online, the list of unstructured data is endless and
will continue to grow with increasing Internet peneratration in 3rd world
conturies. To assume that all users will manually convert this data to
sturcutred seems too far fetched. To assume that the information being put
by these end users is of little uses than say wikipedia/dbpedia would be a
horrible mistake. Even if we have large data, someone needs to club this
vast amount of rdf/owl data and create a global graph interlinking all of
that.(BTW i see some serious ontology issues anyone will likely to hit in
this approach)

>>>>Also, I think IBM's SUKI http://www.research.ibm.com/UIMA/SUKI/ might be
of
>>>>interest.

I have used UIMA but its not a one man army's job. Its just a framework and
there is hell lot of things to be done yet on this. eg. write domain
specific components etc.



>>>>A3 is cumbersome and may produce wrong links and information - a
nightmare without implicit support for provenance. In corporate
>>>>environments A3 is already very popular, but in the broader Web-scale
I'm a bit sceptical this will work well. What do you tink?

I am hoping a lot on the progress we have made in NLP and no doubt NLP will
continue to improve its performance in the near future. Currently to
aliviate the wrong linking/information problem I think reduancy of
information will play an important role. If we have 10 sources of same peice
of information and 6 NLP parsers give one view and rest 4 give other view, i
am pretty sure the one on which 6 are agreeing will be the right one. Also
we dont have to be 100% right(that too in the begining) since ( other than
your boss :) ) nobody is 100% right:)


>>>>Some CMS like Drupal have already understood this and are rapidly moving
towards exposing their content as RDF data

Here's the problem. Drupal are exposing _the data stored in Drupal. Do we
expect everyone on web to use Drupal ? No. What happens to information on
times.com, blogspot.com, googlegroups.com or kashmirtimes.com ? Semantic web
is not about converting someone's data and exposing it with semantic view.
Its about the _whole_ data out there on web and then building a web of
semantic links on top of that and then doing reasoning on top of that etc.


Thanks for initiating the discussion anyways. Keep it coming :)

Received on Monday, 20 October 2008 10:12:59 UTC