Re: WordNet RDF from William Waites on 2010-09-20 (semantic-web@w3.org from September 2010)

From: William Waites <ww@styx.org>
Date: Mon, 20 Sep 2010 14:44:04 +0100
To: Antoine Isaac <aisaac@few.vu.nl>
CC: Toby Inkster <tai@g5n.co.uk>, Ian Davis <lists@iandavis.com>, semantic-web@w3.org, public-lod@w3.org, Jacco van Ossenbruggen <Jacco.van.Ossenbruggen@cwi.nl>, Mark van Assem <mark@cs.vu.nl>
Message-ID: <4C976524.7000107@styx.org>

On 10-09-20 12:45, Antoine Isaac wrote:
> Very interesting! I'm curious though: what's the application scenario
> that made you create this version?

(hopefully this is closely enough related that my reply
below isn't a non-sequitur)

I worked on a toy NLP bot that might expose some "real"
uses for representing natural language in RDF [0]. The
basic premise was to allow users to describe bibliographic
data (works and authors and such) in simple natural
language sentences and have it output RDF (FRBR-esque) [1].

(Motivated partly by the fact that I am terrible at user
interface design and had a very hard time trying to make
a web interface that allowed users to enter data with
anything other than a very simple structure).

One vocabulary that I missed while doing this is something
to represent parts of speech and grammatical syntax in
natural language. I invented something ad-hoc but it might
be useful to have a more completely thought out way to do
this. You can see some examples in the first link.

> How do you make the distinction between the two situations--I mean,
> based on which elements in the Wordnet data?

The approach that I took -- and keep in mind this was a
toy, I have doubts about the scalability doing things this
way was to (1) parse the natural language sentence into an
annotated syntax tree as an intermediate form (represented
in RDF) and then (2) run specially crafted N3 inference
rules over it to generate the desired output. The inference
rules encode the semantic relationships between concepts
existing in (or across) sentences. I mostly worked with
inference rules that hinged on the main verb in the sentence
(which also happens to be the top of the syntax tree).

In principle, with a complete enough set of such inference
rules (most likely restricted to a particular domain of
discourse, a truly general set would be very hard if it is
possible at all) would resolve the ambiguity. In the case
that makes sense there would be useful entailments, in the
case that doesn't there wouldn't. I saw this kind of
resolution of syntactic ambiguity happen a couple of times.
Resolution of homonyms might work similarly.

I'm not so sure the structure of creating a class hierarchy
based on orthographical accident makes sense. Where the
words do have a common conceptual root, certainly. But in
the "crack" example I don't think so. They are (probably)
completely different concepts that just happen to be denoted
by the same string. I might be wrong but I don't think that
wordnet contains enough information to make this choice.

Cheers,
-w

[0]
http://blog.okfn.org/2010/08/09/cataloguing-bibliographic-data-with-natural-language-and-rdf/
[1] http://pastebin.ca/1913826
--
William Waites <ww@styx.org>
Mob: +44 789 798 9965
Fax: +44 131 464 4948
CD70 0498 8AE4 36EA 1CD7 281C 427A 3F36 2130 E9F5

Received on Monday, 20 September 2010 13:45:54 UTC