SHOE, general inferences in RDF, and other stuff

Hi everyone.  My name is Sean Luke, and I am a member of the SHOE team at
University of Maryland.  SHOE is an SGML- and XML-based
knowledge-representation language with much the same design goals as RDF.
Dan Brickley mentioned to me that there is some interest in getting RDF to
do certain SHOE-like things (for example, n-ary relations and inferences),
and has suggested that I post to this list with some current suggestions I
have for RDF based on our research experience with SHOE.  SHOE started
about the same time as MCF; it predates RDF by a little bit.  As such,
many issues that RDF has dealt with recently have also been ones that the
SHOE team grappled with a few years back.  All things SHOE can be found at
http://www.cs.umd.edu/projects/plus/SHOE/

So at Dan's suggestion, Jeff Heflin and I have spent a few days detailing
various areas where we think RDF has weaknesses, some minor, some (IMHO)
grievous, and where our design decisions might stir up some discussion
about where RDF is going, if not make some positive contributions in its
design.  We've tallied a pretty big list of things to discuss (including a
lot of little minor quibbles), but rather than just dump it all here I
figure the right thing to do is post a topic or two at a time.

I thought a good place to start would be with SHOE's biggest departure
from RDF, namely its general inferential semantics.  But there are also a
lot of other areas we think RDF could be improved, including versioning,
n-ary relations, abandoning a frame-based approach, adding data types, and
providing a better namespace mechanism.  If anyone's interested in those,
I'd be glad to post our ideas on them and see what you think.


Inferences and RDF
------------------

RDF's basic semantic design philosophy is different from SHOE's. RDF seems
to follow in the footsteps of frame languages and semantic networks; SHOE
used to be a semantic network as well, but now tends to follow relational
and logical languages: it provides n-ary relations and a general inference
mechanism.

Many of RDF's bigger special-case warts (hard-coded "container" objects,
infinite sets of relations for numbering in containers, "aboutEach",
"subProperty", etc.) stem from its lack of any basic general-purpose
inferential rule mechanism.  With even a very simplistic Horn clause
mechanism, *all* of RDF's present special-case semantics (except its
aboutEachPrefix feature) can be dealt with trivially.

So why is this a problem?  I'd be willing to bet some serious money that
as RDF progresses, the pressure to add more and more special-case warts to
the language will grow, as users rapidly outgrow the present finite
semantic usefulness of a language with no general-purpose inferential
mechanism.  RDF's semantics are presently pretty watered down, so much so
that it is difficult (not impossible) to coherently argue why using RDF is
better than just writing an application in XML for your schema and going
with that.

In the evolution of SHOE we realized that to do anything really useful
with the language, especially in a diverse environment with multiple
schema that will need to be mapped to one another, we needed at least some
basic semantics.  The trick was doing this without increasing
computational complexity beyond usefulness.  We think we've done a fair
job, but we'd be grateful for your opinions on it.

SHOE's present semantics are equivalent to Datalog without negation (not
even stratified negation).  No negation, no procedural attachment, no
numerical operations, and only a single element in the consequent of the
Horn clause.  Here's a *simple* example of transitivity over membership in
binary relations, which as best as I can tell is not describable in RDF.
At any rate, if it is expressable in RDF, I'm sure we can come up with
another that's not :-).  Here we go:

Suppose someone has designed a "university-organization" schema.  In it,
one is able to say that they are a "member" of a given organization.  
Organizations are further able to say that they are "suborganizations" or
other organizations.  Now, I work for the PLUS laboratory, which is a
suborganization of the Advanced Information Technology Laboratory, which
is a suborganization of both UMIACS and the Computer Science Department,
which are both suborganizations of U Maryland, which is a suborganization
of the State of Maryland.  It seems an obviously useful thing for an agent
to discover that I'm an employee of the State of Maryland without me
having to explicitly say all of these employment ground facts; I should
merely have to say that I work for the PLUS lab.

The inferential statement is:

    member(?org2,?person) :- member(?org1,?person) ^ 
			     subOrganization(?org1,?org2) 

This is read as: "?person is a member of ?org2 if ?person is a member of
?org1 and ?org1 is a suborganization of ?org2". In a SHOE schema this
looks like:

<DEF-INFERENCE>
  <INF-IF>
    <RELATION NAME="member">
      <ARG POS=1 VALUE="org1" USAGE=VAR />
      <ARG POS=2 VALUE="per" USAGE=VAR />
    </RELATION>
    <RELATION NAME="subOrganization">
      <ARG POS=1 VALUE="org1" USAGE=VAR />
      <ARG POS=2 VALUE="org2" USAGE=VAR />
    </RELATION>
  </INF-IF>
  <INF-THEN>
    <RELATION NAME="member">
      <ARG POS=1 VALUE="org2" USAGE=VAR />
      <ARG POS=2 VALUE="per" USAGE=VAR />
    </RELATION>
  </INF-THEN>
</DEF-INFERENCE>

Now, if I claim member(me,PLUS Lab), and the PLUS Lab had declared
subOrganization(PLUS Lab, AITL), etc., then an agent should be able to
easily determine that I work for the State of Maryland (or the CS
Department, or whatever).

Of course, RDF does provide one or two special-case versions of
transitivity (subProperty and subClass).  But there are lots of examples,
like the one above, that don't work with these special cases very well.

So there are of course a lot of places where inferencing comes in handy.  
Reducing necessary ground statements (RDF presently doesn't provide basic
general-purpose transitive closure, inversion, or transfers-through
relationships), setting up mappings between schemas and also among
versions, and extending the semantic meaning involved when a web page
claims some fact is true.

The critical issue is of course how to implement such semantics without
getting bogged down in a morass of computational complexity a-la KIF,
given the potential amount of data out there.  Since they are mappable to
Datalog, SHOE's inferential semantics are at least polynomial if not
better, and we believe that they are limited enough to deploy on a large
scale, especially in domains where ontologies are relatively disjoint from
one another.

Still, I can think of some further restrictions can be applied that would
enforce an even more efficient inferential approach. For example, a schema
X might declare itself to belong one of three levels: one that does not
provide inferences, one that disallows any inferences from outside schema
including symbols declared in X, and one that permits full inferential
capabilities.  Revisions of schema (later versions) are permitted only to
increase the semantic level or keep it the same, but not decrease it.  
Even more semantic expressivity might be added in this way (another level
which permits stratified negation, for example).  Also certain relations
or simple (one-level, non-recursive) inferences might be declared Final,
to indicate to agents that they should be simply flattened when gathered
rather than inferred over and over.  Certainly most if not all of RDF's
current special-case inference mechanisms can be declared Final without a
significant decrease in speed of data-gathering.

Sean

Received on Wednesday, 22 December 1999 18:15:51 UTC