Re: web to semantic web : an automated approach from Hugh Glaser on 2008-10-23 (semantic-web@w3.org from October 2008)

From: Hugh Glaser <hg@ecs.soton.ac.uk>
Date: Thu, 23 Oct 2008 10:53:56 +0100
To: रविंदर ठाकुर (ravinder thakur) <ravinderthakur@gmail.com>, Andreas Langegger <al@jku.at>, Semantic Web <semantic-web@w3.org>
CC: "semantic_web@googlegroups.com" <semantic_web@googlegroups.com>
Message-ID: <C5260844.275E4%hg@ecs.soton.ac.uk>

Talking of NLP to populate the SW, reminded me of a note I wrote a while
back (beginnings of AKT, 2003) when I was trying to understand the types of
things (agents/services/endpoints/RDF Graphs) that a live SW populator would
need.

Being a pure functional person, I characterised a bit of it as follows:
************************
AKT Concepts - a Functional Example
===================================

Classifier: Ontology -> Corpus -> Document -> Classification
A Classifier takes
    an Ontology to classify with respect to,
    a Corpus to train on,
    a Document to classify
and returns a Classification.

Corpus-maker: Classifier -> Ontology -> [Document] -> Corpus
A Corpus-maker takes
    a Classifier to perform the function,
    an Ontology to classify with respect to,
    a bag of Documents
and returns a Corpus.

Corpus: [Document x Classification].

So now we know why the Semantic Web hasn't happened yet.
Classifiers need Corpuses, and Corpuses need Classifiers.
In fact the above suggests that what we really need to compute
is the fixed point of this mutually recursive process!
************************

My apologies for the notation, which includes partial application of the
Classifier function to make sense of it.

But I suspect the premise is still correct (although nothing new).
We would like to classify, mark up or translate into RDF etc loads of
documents or text fragments, and would like tools to do it. One standard way
is to use a machine learning system, but that requires training sets.
In the absence of training sets, the bootstrap process is started by hand,
which is what is happening, but it is a long process.

But there must be some resources to be used as training sets being created
(our ReSIST project has done one in Dependable Systems, which now uses a
trained system to classify IEEE papers from DSN).

For example, is anyone using dbpedia/wikipedia as a training set to build a
classifier system which will relate documents to dbpedia concepts?

Best
Hugh

On 20/10/2008 13:03, "रविंदर ठाकुर   (ravinder thakur)"
<ravinderthakur@gmail.com> wrote:

> there will always be people overly optimistic and overly pessimistic about
> anything in the world so i wont take the case with case with SW as an
> exception. On the otherhand we shouldn't  an option just because it there
> doesn't seems anyone asking for it. When Faraday found a way to create
> electircty he thought that was a useful invention since nobody uses or is
> asking for electicity. I don't see _everyone_ to be using RDFs but this can be
> used to solve many problems that no other technology can boast of. This is
> especially true since more and more data is coming to web and we need a better
> way to analyse/search/present that data to the end users looking for it.
>
> But coming to the main point, i don't see why semantic web as envisioned by
> its main proponents shouldn't work. I am not saying that first attempt at it
> will be last one but an honest attempt is much better than some untested
> opinions :).
>
> What we need is
> a) a NLP system (similar to the one in www.opencalais.com
> <http://www.opencalais.com> ) that converts the data on the web to its
> semantic form(rdf/owl etc) for much broader set of concepts.
> b) a store for this data from a)
> c) a reasoner for data stored in b)
>
>
> As I see it a) is the hardest part and ignoring performance/scalability
> issues, b) and c) already exists. Its the lack of a) that is keeping them from
> achieving anything great with semantic web.
>
>
>

Received on Thursday, 23 October 2008 09:55:15 UTC