- From: ben syverson <w3@likn.org>
- Date: Sat, 5 Mar 2005 06:26:10 -0600
- To: semantic-web@w3.org
On Mar 5, 2005, at 4:59 AM, Phil Dawes wrote: > Hi Ben, > This sounds like an interesting project, if a bit ambitious! Ambitious is putting it politely; the only way you can start a project like this is with complete ignorance of what it will take to complete. :) Luckily most of it is behind me; the main things that remain are teaching the chatbot how to ask people to clarify N-ary relationships <http://www.w3.org/TR/swbp-n-aryRelations/> handle certain constraints, and then figuring out the best way to push all this metadata. [snip] > E.g. how do you generate URIs? How do you disambiguate words with the > same spelling (sleeper, sleeper and sleeper)?. How do you disambiguate > senses of the same word (myserver1a the server, myserver1a the > dnsname). > RDF solves this problem by requiring that the author generates > seperate URIs for each sense/meaning, but this doesnt map well to a > user experience. Since the default interface to the data is a Wiki-ish thing, there's already a built-in fully-qualified URI for every unique term, and it even leads to a page with a description. Of course, there's no way to guarantee that nodes will represent only one concept -- the chatbot would have to be much to hostile to enforce that. :) But there are things you can do. If you try to make something an instance or type of something when it already has a class associated with it, the bot can challenge it. me: "Mandarin is an instance of dialect." likn: "I thought it was a type of fruit!" Then, the user is forced to either agree/disagree with the assertion "Mandarin is a type of fruit," or make a statement like "Mandarin is also an instance of dialect." When likn hears "also", it will press you: "do you mean that Mandarin has two meanings, or that Mandarin, a type of fruit, is also an instance of dialect?" (This is where it becomes useful to parse input like "Uh, the first one.") If you tell likn Mandarin has two meanings, it can make a new node (and URI) for it. To the users of the chat system, they'll never need to know there's a separate node for the other meaning. They'll ask "What is mandarin?" and get the reply "Mandarin is a dialect or a type of fruit." However, the Wiki users will notice -- on the "Mandarin" page, all of a sudden they see "(fruit)" next to the name, and see that there's a new link to "Mandarin (dialect)." This will hopefully inspire them to separate the text if both topics are discussed in the first node, or add information if there's nothing in there about the dialect. But whenever possible, I avoid challenging the user, because it can get annoying. So if a user makes an assertion that a thing is related to a subclass of something it's already related to, likn'll accept it and quietly ignore (but not delete) the more general assertion: me: "Mandarin is an instance of citrus." likn: "Okay, got it. Anything else about Mandarin?" > To be honest, this sounds like you might get away with making it an > internal thing - I'd start by building your internal datastructures to > support the application, and then worry about mapping to RDF > later. (RDF is very clumsy for certain things - reification and > ordered collections are two of them) The datastructures are there, it's just a matter of whether this information is helpful to the SW as a whole. The confidence figures may not be very accurate (the poll sample is likely to be two or three users for many nodes), but they might help other reasoners assess the quality of various assertions. I suppose this is not something that's "built-in" to RDF, but more importantly, is that information useful? > If it's any interest to you, I'm currently experimenting with an RDF > like model without the URIs (using tags instead of URIs). It trades > simplicity for increased ambiguity. I'm experimenting with UI and > statistical methods for disambiguation. [clip] That's fascinating stuff -- del.icio.us both excites and terrifies me. :) I'm not sure how I feel about trying to add semantic weight to tags, but statistical analysis is certainly an interesting step. Why not a hybrid? You could work all day with tags, and then to convert to RDF, you could turn "FrenchHorn" into "http://www.phildawes.net/tags/FrenchHorn." I've thought about employing statistics in likn, specifically to guess cardinality within a colony, but you'd need to guess and check: likn: "Hey, I've noticed that all people have one heart, and that no one has zero hearts. Do all people have exactly one heart?" But maybe that would be a waste of everyone's time. If no one has specifically requested a constraint, maybe it's not sufficiently important to the users to remember. That is, no one is likely to try to tell likn about a person with two hearts (although they might tell likn about someone with no heart ^_^), and no one is likely to ask likn if a person must have a heart, so why bother a user about it? On the other hand (or, uh, back to the original hand), maybe the stats could be used in non-constraining ways: me: "Ben has two hearts." likn: "I've never heard of anyone with two hearts! Are you sure?" And when answering a query definitively, you could ask the user if she wanted to add a constraint: me: Do all people have hearts? likn: All the people I know about have exactly 1 heart. Is that a requirement of 'person?' Thanks for the feedback, Phil! - ben
Received on Saturday, 5 March 2005 12:26:13 UTC