RE: web to semantic web : an automated approach from John Flynn on 2008-10-20 (semantic-web@w3.org from October 2008)

From: John Flynn <jflynn@bbn.com>
Date: Mon, 20 Oct 2008 12:36:17 -0400
To: 'रविंदर ठाकुर (ravinder thakur)' <ravinderthakur@gmail.com>, <semantic-web@w3.org>, <semantic_web@googlegroups.com>
Message-ID: <000901c932d1$f9ee7dc0$edcb7940$@com>

BBN, and other NLP researchers, have had considerable success in using NLP to automatically extracting instance data from unstructured text and mapping it into ontological knowledge bases. The issue of co-reference resolution remains a difficult  problem. Extracting structure and  automatically creating the ontology is an even harder problem. Continued research in these areas is important because a great deal of human knowledge is contained in unstructured data. However, I’m personally convinced that in the long (maybe very long) run the best approach will be to mark up data as instances of classes and properties of ontologies as the very first step  in information processing and then automatically generating unstructured (human readable) text  from the knowledge bases that is tailored to specific human information requests. Human analyst would no longer spend their time writing unstructured text but would rather populate Semantic  Web knowledge bases. Of course, this approach to publishing wouldn’t apply to novels, plays, poems and other such works of art, only tailored responses to direct requests for information. 

 

John

 

From: semantic-web-request@w3.org [mailto:semantic-web-request@w3.org] On Behalf Of ?????? ????? (ravinder thakur)
Sent: Monday, October 20, 2008 11:55 AM
To: John Flynn; semantic-web@w3.org; semantic_web@googlegroups.com
Subject: Re: web to semantic web : an automated approach

 

Buts whats the incentive for web site owners to mark up their website with semantic data. Few days back i was reading some study conducted by Opera browser team that said that most of the html generated by websites is not even valid. How can we hope them to create correct semantic data. Also what happens to lot of other user submitted content(blogs, wikis etc ) ?

Instead why not create a mechanism to automatically convert web data to semantic data. Opencalais.com is already doing it on small domain, why can't/shouldn't we do it at web's scale ?


John : I realized that you are form BBN. In case you are aware, can you please tell us from your experience about the state of NLP ? To what extent the current best NLP systems are capable of extracting infroatmion from unformatted text ? And what are the hopes for the future to  overcome the curent shortcomings in NLP systems?

Received on Monday, 20 October 2008 16:36:48 UTC