Re: [www-rdf-interest] <none> from Sean Luke on 1999-12-27 (www-rdf-interest@w3.org from December 1999)

From: Sean Luke <seanl@cs.umd.edu>
Date: Mon, 27 Dec 1999 12:43:31 -0500 (EST)
To: Sankar Virdhagriswaran <sv@crystaliz.com>
cc: www-rdf-interest@w3.org, jhendler@darpa.mil, heflin@cs.umd.edu
Message-ID: <Pine.SO4.4.05.9912271217390.28089-100000@jifsan.cs.umd.edu>
On Thu, 23 Dec 1999, Sankar Virdhagriswaran wrote:

> > So why not just dump the special-case stuff and go with simple
> > (simplistic) general-purpose inferential semantics to begin with?
> 
> I am wondering about two different types of scalability:
> 
> a) Agent writer(s) scalability: The 'real world' still is hacking away in
> VBScript/Perl. For them, inferencing (of the logic/prolog kind) is too
> foreign. Given an object model, they are more prone to write 'procedural'
> agents. At best, we can hope them to write SQL queries.

That's fair; such people's schema won't use general inferences.  But lack
of use doesn't argue against including such things for those groups which 
*do* choose to use inferences.

> Can you imagine writing an 'inferencing engine' for the semantic web
> that can scale the same way if one adopts your model? I think we need
> to start very, very simple since we are looking at a scalability that
> has not been attempted by any of the knowledge representation projects
> I know.

Now _this_ gets to the heart of the matter.  While Datalog is polynomial,
and in general I imagine schema on the web will be relatively disjoint (or
hierarchical at worst) nonetheless Datalog's worse case is still pretty
bad compared to a single SQL query.

But such things can be had in bits and pieces.  As I had mentioned before,
one nice model would be for RDF to provide several "levels" to which
schema can subscribe.  A basic level would have no inferential semantics.  
A higher level would allow only those semantics which can be easily
flattened (inferences declared "final", that is, which only operate over
ground facts or facts recursively created by the inference itself).  This
would provide a tremendous advantage to schema-merging and eliminating the
need for redundant ground statements such as the examples in my earlier
post.  Higher levels might only permit inferences actually declared in the
schema proper, or ultimately all inferences.  If Altavista doesn't have a
system capable of some level of sophisticated semantics, it can simply
tell people that it doesn't accept schema which insist on them.  If
specialized agents can use inferences to their advantage, let them.  
Natural selection on the web will sort this out nicely.

While it's one thing to say that a Datalog-like model has too much
computational complexity for current RDBMS out there, it's hard to say
this while RDF _presently_ appears to have semantic features which are
already difficult to implement in a database. aboutEach, subPropertyOf,
subClass, etc. already are O(n^2), though I imagine they could be
flattened at fact-gather time (aboutEach would require keeping the rules
around, because it's open-world). In its simplest to implement form
(flattening), reification bulks up relational tuples with unique
identifiers. Reification in combination with aboutEach could also result
in pretty nasty worst-case semantics, though I believe that
serindipidously RDF's restriction on no inverse relations allows RDF to
escape that dilemna for now.  At any rate, it seems to me that by
insisting on binary relations and several custom-designed inferential
mechanisms, RDF is painting itself into a corner by requiring these
features of even lowly RDBMS, while making it difficult to create more
general mechanisms that cover these warts for those more advanced agents
which will handle general semantics in the future.

> We are operating in a completely different world as compared to the
> closed world situations where knowledge representation techniques have
> been used in the past.

Of course the web is open-world.  But since both RDF and SHOE are
additive-only, and monotonic, I'm not sure if this matters.

Sean
Received on Monday, 27 December 1999 12:43:39 UTC