Re: A long but hopefully interesting introduction

On Sat, 5 Mar 2005 00:23:36 -0600, ben syverson <w3@likn.org> wrote:

> The project itself is sort of a wiki-ish system which is syndicated in
> a zillion different ways, and which collects and maintains a boatload
> of metadata with an associated dynamic ontology. More specifically, it
> will be an open-source mod_perl application which supports "solo" posts
> and public wiki-like documents, and an associated chatbot which asks
> and answers questions about nodes and their relationships. 

Sounds great! 

There are a few bots around #swig that may be of interest, like julie.
(Incidentally the sembot thing is another resident lurker on my own
to-do list).

The system
> outputs XHTML, RSS, RDF and OWL descriptions of the data and
> relationships contained in it. Every node is syndicated, so if a node
> is a Class, its RSS will reflect any new sub-classes or instances (eg,
> if you subscribe to "citrus," you'll get notification any time the node
> is edited or replied to, as well as when someone adds "mandarin orange"
> and classifies it as a type of citrus).

Like Phil, I'll be interested to hear how you intend to do this.

> Because likn will be generating vast amounts of metadata and building
> ontological information on the fly, I want to make sure it will have a
> very positive ecological impact in terms of the SW. In that vein, there
> are several things that I have immediate questions about. Please bear
> with me if my questions are naive...
> 
> 1) Each installation of the software will be building its own ontology
> as more information is added. The chatbot recognizes and happily
> digests statements such as "a person can only have one mother."

Nice.

Thus,
> the site's ontology is not fixed and carefully crafted, but public, not
> fully trustworthy, and ever evolving, which is, in my opinion, The Way
> It Should Be. 

For an application like this, agreed 100%. The ecosystem should
support a whole spectrum of trustworthiness, including bits related to
ontologies.

The problem is that I'm concerned that this might violate
> the spirit of OWL; it's my understanding that OWL ontologies are meant
> to be stable, versioned and reusable, in the hopes that people will
> share or merge standard versions of them.

Hmm, I think there's at least two angles to that - if something's
reasonably stable per-version then it's going to be properly reusable.
But that doesn't mean it has to be totally rigid over time, in
particular more statements relating to the ontology may be added in
just the same way that statements relating to instance data may be
added. Again I think there's a bit of app-specificity in that for some
purposes rigidity is desirable to maintain some kind of correctness,
(like avoiding the apparently common confusion between the terms
"traveller" and "terrorist"), at other times it may just misplace
something in their blog index (and not land anyone in jail). Whatever,
it's probably worth considering whether some kind of proof mechanism
can be used, so if you do wind up with unexpected conclusions you can
backtrack to their source.

I'm beginning to get the feeling that SemWeb development is a little
hampered by assumptions from other languages, especially those around
XML & relational DB schema. Much of the power of RDF/OWL comes from
the flexibility, and we shouldn't be frightened of (for example)
creating many and/or huge ontologies and only using a tiny fraction of
the available terms. Assuming the modelling could go either way,
putting something in the Class system rather than using instances (or
even literals) means there's more potentially for reasoning. Classes
and properties are cheap!

In other words, if I'm not entirely happy with
http://purl.org/stuff/pets#Cat then I shouldn't hesitate to define a
new term, say http://dannyayers.com/2005/05/pets#Cat. I could maintain
desirable semantics by the new term as a subclass of the old one, or
whatever. If I don't need to make any modifications, ok, I've got a
duplicated term. But this is still reusing the existing ontology, and
the cost shouldn't be to great. It may take more convoluted inference
to get answers, but I reckon that flexibility to facilitate model
"fitness" is more important than religious direct reuse and/or bending
things for the sake of performance - that seems like premature
optimisation.

Until fairly recently I would have encouraged the direct-reuse
approach to vocabularies, but there are definitely circumstances where
this doesn't go down too well and might even be counter-productive.
(Case in point being Atom - overall there's been a great desire to
create things from scratch, even if that meant a lot of wheel
reinvention. But there's no real net loss, as relationships with other
vocabularies can be identified later, independently).

 It's of course possible to
> share and merge a dynamic ontology, but it must be done with the
> understanding that the constraints and statements made are suspect and
> in-flux, and ideally the reasoner should be able to understand how
> often it should check for new versions (either through something like
> sy:updateFrequency or through its own cache rules and a "Last-Modified"
> field). Because eventually, someone is likely to tell likn "a person
> can have more than one mother, but only one birth mother."
>        (1.a) One workaround is to describe the constraints and
> relationship types in plain RDF and not use OWL at all. But then I'm
> using a non-standard and homebrew method of describing the ontology,
> when the whole point is to facilitate interchange.

In the context of RDF/OWL, I don't think homebrew and interchange are
mutually exclusive - if anything, hopefully they'll be complementary
(my homebrew can connect to your homebrew, thus connecting the
interwiki upper ontology you reference to the standard furry quadruped
vocabulary I use...).

> 2) Does anyone have any philosophical objections to using OWL Full to
> liberally allow Classes as Property Values? I read
> <http://www.w3.org/TR/swbp-classes-as-values/> with great interest, and
> would like to allow many relationships to form using the model
> described in Approach 1. I want to be able to preserve the ability to
> have the following exchange, without resorting to hackery such as
> intermediary nodes like "LionSubject":
> me: Lions: Life in the Pride's subject is Lions.
> likn: I assume you mean its subject is 'Lion?'
> me: Yup. Now tell me about lion.
> likn: Lion is a type of Animal, and is the subject of the book 'Lions:
> Life in the Pride.'
> ....
> In short, is there any good reason to explicitly separate Classes from
> Property Values, when it makes so much sense not to?

Wow, I got deja vu on that one, I must have asked the same question
myself in the recent past. It's not very explicit in that doc, but all
things being equal it's not so much a philosophical question as a
computational one. If you start treating classes as individuals then
it makes inference that much more complex. This may not be a problem
if you're just using the graph model aspect of RDF, or wiring up your
own app-specific reasoner, but if you're wanting to plug in an
off-the-shelf DL engine then you'll have to abide by the language's
constraints.

> 3) There's the obvious issue of duplication -- one of the most
> attractive aspects of a shared ontology is that you don't have to
> repeat someone else's work, but that's exactly what likn asks its users
> to do. Someone may have developed a beautiful ontology to describe
> food, but because a likn installation may service a community with its
> own definitions of the same terms and their relationships, we can't
> directly use other ontologies. Within an installation, likn is an open,
> free-linking system, but to the outside world, it's a "Push" provider
> of data. You can utilize a likn ontology outside of likn, but it would
> only really be useful for examining data from that particular likn
> colony -- you wouldn't want to rely in your own application on its
> description of "star wars," for example, for fear that its definition
> could change from the movie to the Reagan proposal. So at first blush,
> publishing likn ontologies seems useless to anyone -- but then I can
> imagine a third party developing (for example) a really amazing
> OWL-based search engine, which could be very useful for finding things
> in likn colonies.

This all sounds reasonable, but I would suggest that in a situation
like this a little indirection is probably desirable. The approach
I've ended up using with similar stuff is to partition up the
vocab/ontology space. So a certain vocabulary may contain more
generally shared definitions (e.g. WordNet) but then I might have
corresponding terms in the vocabulary I'm using in the context of a
particular project, or even in the context of my personal blog. If
each of these is maintained in a separate namespace, then there's much
more flexibility for interconnection.

I may begin by asserting that "Pet" on my blog is an equivalent class
to "Pet" in WN. But then my usage may drift, and so I can shift a gear
to make raw:Pet more or less general than wn:Pet (i.e. raw:Pet
rdfs:subclassOf wn:Pet or wn:Pet rdfs:subClassOf raw:Pet, rather than
the bidirectional relationship of equivalence) . Ok, this assumes that
the Sem Web won't remember the first assertion, but at this point in
time that seems a fair pragmatic assumption, and when you're looking
at local reasoning that's pretty easy to arrange. I suppose the
ontologies should be versioned and annotated as cleanly as possible,
but until you need to hook into other caches/triplestores which
remember your earlier assertions on the Web there shouldn't be too
many problems (datestamped annotations are probably a good idea).

There's additional help when it comes to more, errm, humanish
terminology (great for blogs and Wikis) in the form of SKOS, which
allows for less tightly-bound relationships between terms without
throwing away all the reasoning potential. I reckon MortenF's FOAF
Output Plugin for WordPress is a real shining light here, gluing
simple tagging "folksonomies" (yerch, horrid contraction) to formal
knowledge representation, all without any extra user input beyond the
simple install.

Just my €0.02. Anyhow be sure and let us know how you get on.

Cheers,
Danny.

Received on Saturday, 5 March 2005 13:01:56 UTC