A long but hopefully interesting introduction

Hello,

The last time I worked with metadata on the web was with MCF files 
almost ten years ago, but now I'm very anxious to dive back in. To that 
end, I've developed a nice juicy project to work on to help me sort 
through and understand all the issues involved. The project is named 
"likn," which is a sort of head-on collision between "liken" and 
"link."

The project itself is sort of a wiki-ish system which is syndicated in 
a zillion different ways, and which collects and maintains a boatload 
of metadata with an associated dynamic ontology. More specifically, it 
will be an open-source mod_perl application which supports "solo" posts 
and public wiki-like documents, and an associated chatbot which asks 
and answers questions about nodes and their relationships. The system 
outputs XHTML, RSS, RDF and OWL descriptions of the data and 
relationships contained in it. Every node is syndicated, so if a node 
is a Class, its RSS will reflect any new sub-classes or instances (eg, 
if you subscribe to "citrus," you'll get notification any time the node 
is edited or replied to, as well as when someone adds "mandarin orange" 
and classifies it as a type of citrus).

Because likn will be generating vast amounts of metadata and building 
ontological information on the fly, I want to make sure it will have a 
very positive ecological impact in terms of the SW. In that vein, there 
are several things that I have immediate questions about. Please bear 
with me if my questions are naive...

1) Each installation of the software will be building its own ontology 
as more information is added. The chatbot recognizes and happily 
digests statements such as "a person can only have one mother." Thus, 
the site's ontology is not fixed and carefully crafted, but public, not 
fully trustworthy, and ever evolving, which is, in my opinion, The Way 
It Should Be. The problem is that I'm concerned that this might violate 
the spirit of OWL; it's my understanding that OWL ontologies are meant 
to be stable, versioned and reusable, in the hopes that people will 
share or merge standard versions of them. It's of course possible to 
share and merge a dynamic ontology, but it must be done with the 
understanding that the constraints and statements made are suspect and 
in-flux, and ideally the reasoner should be able to understand how 
often it should check for new versions (either through something like 
sy:updateFrequency or through its own cache rules and a "Last-Modified" 
field). Because eventually, someone is likely to tell likn "a person 
can have more than one mother, but only one birth mother."
       (1.a) One workaround is to describe the constraints and 
relationship types in plain RDF and not use OWL at all. But then I'm 
using a non-standard and homebrew method of describing the ontology, 
when the whole point is to facilitate interchange.

2) Does anyone have any philosophical objections to using OWL Full to 
liberally allow Classes as Property Values? I read 
<http://www.w3.org/TR/swbp-classes-as-values/> with great interest, and 
would like to allow many relationships to form using the model 
described in Approach 1. I want to be able to preserve the ability to 
have the following exchange, without resorting to hackery such as 
intermediary nodes like "LionSubject":
me: Lions: Life in the Pride's subject is Lions.
likn: I assume you mean its subject is 'Lion?'
me: Yup. Now tell me about lion.
likn: Lion is a type of Animal, and is the subject of the book 'Lions: 
Life in the Pride.'
....
In short, is there any good reason to explicitly separate Classes from 
Property Values, when it makes so much sense not to?

3) There's the obvious issue of duplication -- one of the most 
attractive aspects of a shared ontology is that you don't have to 
repeat someone else's work, but that's exactly what likn asks its users 
to do. Someone may have developed a beautiful ontology to describe 
food, but because a likn installation may service a community with its 
own definitions of the same terms and their relationships, we can't 
directly use other ontologies. Within an installation, likn is an open, 
free-linking system, but to the outside world, it's a "Push" provider 
of data. You can utilize a likn ontology outside of likn, but it would 
only really be useful for examining data from that particular likn 
colony -- you wouldn't want to rely in your own application on its 
description of "star wars," for example, for fear that its definition 
could change from the movie to the Reagan proposal. So at first blush, 
publishing likn ontologies seems useless to anyone -- but then I can 
imagine a third party developing (for example) a really amazing 
OWL-based search engine, which could be very useful for finding things 
in likn colonies.

4) One possibility is to allow the recognition/merging of other 
ontologies, but qualify their use within likn. For instance:
me: tell me about dog
likn: 'dog' is a type of animal, but according to AnimalNet, dog is a 
type of 'mammal.'
Which is all well and good, but what if you want to create 
equivalences? If you want to say that our 'dog' is equal to AnimalNet's 
'dog,' now anyone asking about dog gets something like:
likn: 'dog' is a type of animal and a type of mammal.
me: what's a mammal?
likn: I don't know, but according to AnimalNet, mammal is a type of 
animal.
Now we have two rivaling definitions of 'animal'. Likn could be smart 
enough to ignore redundant statements (given the two statements "ben is 
an instance of programmer" and "ben is an instance of person," likn 
will favor the more specific type of person), it can't (or shouldn't) 
automatically infer that AnimalNet's 'animal' is equivalent to our 
'animal,' because our likn colony could in fact be a Muppet fansite, 
and 'Animal' could talk very specifically about the character of the 
same name (although in that case, no one would assert that 'dog' is 
type of animal). So things get very confusing and messy. Is there a 
good/established/proposed way of handling this? Possibly through 
reification?

5) One aspect of the app is that users can vote on assertions. So if 
three people agree that "ben is an instance of person" and one person 
disagrees, likn is 75% sure that ben is a person. Is it best to do just 
do this via a reified statement such as the following?
<rdf:Description>
      <rdf:subject rdf:resource="http://likn.org/dog" />
      <rdf:predicate rdf:resource="http://likn.org/footType" />
      <rdf:object rdf:resource="http://likn.org/paw" />
      <rdf:type 
rdf:resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#Statement" />
      <likn:confidence>75</likn:confidence>
</rdf:Description>

6) Does anyone have any input, guidance or problems with my general 
approach, or specific aspects?

Anyway, thanks in advance -- and hello!

- ben syverson

Received on Saturday, 5 March 2005 08:14:05 UTC