Re: A long but hopefully interesting introduction

On Mar 5, 2005, at 4:59 AM, Phil Dawes wrote:

> Hi Ben,
> This sounds like an interesting project, if a bit ambitious!

Ambitious is putting it politely; the only way you can start a project 
like this is with complete ignorance of what it will take to complete. 
:) Luckily most of it is behind me; the main things that remain are 
teaching the chatbot how to ask people to clarify N-ary relationships 
<http://www.w3.org/TR/swbp-n-aryRelations/> handle certain constraints, 
and then figuring out the best way to push all this metadata.

[snip]
> E.g. how do you generate URIs? How do you disambiguate words with the
> same spelling (sleeper, sleeper and sleeper)?. How do you disambiguate
> senses of the same word  (myserver1a the server, myserver1a the 
> dnsname).
> RDF solves this problem by requiring that the author generates
> seperate URIs for each sense/meaning, but this doesnt map well to a
> user experience.

      Since the default interface to the data is a Wiki-ish thing, 
there's already a built-in fully-qualified URI for every unique term, 
and it even leads to a page with a description. Of course, there's no 
way to guarantee that nodes will represent only one concept -- the 
chatbot would have to be much to hostile to enforce that. :) But there 
are things you can do. If you try to make something an instance or type 
of something when it already has a class associated with it, the bot 
can challenge it.
me: "Mandarin is an instance of dialect."
likn: "I thought it was a type of fruit!"
      Then, the user is forced to either agree/disagree with the 
assertion "Mandarin is a type of fruit," or make a statement like 
"Mandarin is also an instance of dialect." When likn hears "also", it 
will press you: "do you mean that Mandarin has two meanings, or that 
Mandarin, a type of fruit, is also an instance of dialect?" (This is 
where it becomes useful to parse input like "Uh, the first one.")
      If you tell likn Mandarin has two meanings, it can make a new node 
(and URI) for it. To the users of the chat system, they'll never need 
to know there's a separate node for the other meaning. They'll ask 
"What is mandarin?" and get the reply "Mandarin is a dialect or a type 
of fruit." However, the Wiki users will notice -- on the "Mandarin" 
page, all of a sudden they see "(fruit)" next to the name, and see that 
there's a new link to "Mandarin (dialect)." This will hopefully inspire 
them to separate the text if both topics are discussed in the first 
node, or add information if there's nothing in there about the dialect.
      But whenever possible, I avoid challenging the user, because it 
can get annoying. So if a user makes an assertion that a thing is 
related to a subclass of something it's already related to, likn'll 
accept it and quietly ignore (but not delete) the more general 
assertion:
me: "Mandarin is an instance of citrus."
likn: "Okay, got it. Anything else about Mandarin?"

> To be honest, this sounds like you might get away with making it an
> internal thing - I'd start by building your internal datastructures to
> support the application, and then worry about mapping to RDF
> later. (RDF is very clumsy for certain things - reification and
> ordered collections are two of them)

The datastructures are there, it's just a matter of whether this 
information is helpful to the SW as a whole. The confidence figures may 
not be very accurate (the poll sample is likely to be two or three 
users for many nodes), but they might help other reasoners assess the 
quality of various assertions. I suppose this is not something that's 
"built-in" to RDF, but more importantly, is that information useful?

> If it's any interest to you, I'm currently experimenting with an RDF
> like model without the URIs (using tags instead of URIs). It trades
> simplicity for increased ambiguity. I'm experimenting with UI and
> statistical methods for disambiguation.
[clip]

      That's fascinating stuff -- del.icio.us both excites and terrifies 
me. :) I'm not sure how I feel about trying to add semantic weight to 
tags, but statistical analysis is certainly an interesting step. Why 
not a hybrid? You could work all day with tags, and then to convert to 
RDF, you could turn "FrenchHorn" into 
"http://www.phildawes.net/tags/FrenchHorn." I've thought about 
employing statistics in likn, specifically to guess cardinality within 
a colony, but you'd need to guess and check:
likn: "Hey, I've noticed that all people have one heart, and that no 
one has zero hearts. Do all people have exactly one heart?"
      But maybe that would be a waste of everyone's time. If no one has 
specifically requested a constraint, maybe it's not sufficiently 
important to the users to remember. That is, no one is likely to try to 
tell likn about a person with two hearts (although they might tell likn 
about someone with no heart ^_^), and no one is likely to ask likn if a 
person must have a heart, so why bother a user about it?
      On the other hand (or, uh, back to the original hand), maybe the 
stats could be used in non-constraining ways:
me: "Ben has two hearts."
likn: "I've never heard of anyone with two hearts! Are you sure?"
      And when answering a query definitively, you could ask the user if 
she wanted to add a constraint:
me: Do all people have hearts?
likn: All the people I know about have exactly 1 heart. Is that a 
requirement of 'person?'

Thanks for the feedback, Phil!

- ben

Received on Saturday, 5 March 2005 12:26:13 UTC