Re: Expressing operators and functions in RDF from Drew McDermott on 2003-11-07 (www-rdf-logic@w3.org from November 2003)

From: Drew McDermott <drew.mcdermott@yale.edu>
Date: Fri, 7 Nov 2003 13:41:21 -0500 (EST)
To: www-rdf-logic@w3.org, weisheng@pcigeomatics.com, skeens@pcigeomatics.com
Message-Id: <200311071841.hA7IfLu26724@pantheon-po04.its.yale.edu>
   [Stephane Fellah}
   If I understand well, RDF assumes that every triple is a fact (an
   assertion) so they are always true.
   Example: I assert 1 + 2 =3D 5,  which becomes in triple (CWM) style:
   (1,2) math:sum 5.=20

   There is no way in RDF to say that this triple is non-asserted except by
   using reification (but there is not standard for this). I assume what
   you mean by RDF++ is the support of non-asserted statement in RDF model.
   Is that correct ?

Yes.

   My goal is to use the simple RDF triple model to express query/rule, so
   I am not introducing another model/language to express query/rule
   (except adding the concept of variable). In the CONTEXT OF A QUERY OR A
   RULE, I assume that every triple is NON-ASSERTED.  Basically, I can
   summarize by saying antecedants of rule and queries are knowledge bases
   of non-asserted statements. The query/inference engine needs to assert
   the truth of every single statement in the query.

You can do that, but a standard RDF analyzer would assume that both
the antecedent and consequent were true.  Of course, if they contain
variables, it will not be clear what they're true _of_.

   Of course this query triple cannot be mixed with a knowledge base,
   because they will be no way to know what statement are true or false,
   but I cannot see a case where a query would be mixed with facts. If you
   see one, please give me examples.

I can certainly see where rules would be mixed with facts.  Most rules
are in essence just material implications, which are just facts.

There's no problem making up a gimmick to ensure that your RDF
analyzer properly interprets a set of triples as "nonasserted."  The
problem is that there's no standard way to do that, so we'd end up
with a bunch of inference engines tweaked in incompatible ways.

   ...
   About variables now:

   In order to work, it is important to introduce the concept of variable
   as a fundamental concept in the query/rule language (by extending OWL
   such as in OWL Rule Language). To indicate that triple must match/ may
   match or must not match, we should have tree flavors of variable:
   mustBindVariable, mayBindVariable, mustNotBindVariable (similar to DQL
   idea).=20
   Also every triple containing variables is AUTOMATICALLY NON-ASSERTED and
   requires unification in the query engine.

You can't automatically non-assert something.  

   >Hence the option of making a relation a property is simply not
   >available to any system but CWM.  If you tried saying "if triple1 then
   >triple2," the mere occurrence of triple1 and triple2 would cause them
   >to be asserted, so that "if p then q" would have the same meaning as
   >"p and q."

   Based on my model, this cannot happen, because every triple is assumed
   non-asserted. Did I miss something ??=20

Yes and no.  Yes, you missed the fact that standard RDF engines will
misuse your dataset.  No; if you refuse to let your RDF out into the
wild, then you're safe.

   >> Which one requires less parsing work ?=20
   >I don't understand; parsing from what to what?  What kind of work --
   >programmer work or computer work?

   For both. Promoting relations/operators as subject of discussion
   represents the dual view of my property centric model (CWM model). If
   you have already a unification engine for RDF based on Triple Match such
   as Jena, the subject centric model requires to convert them to triple
   pattern internally (requiring a extra-step in the processing of the
   query) and requires to understand the meaning of the operand (which on
   is the domain and which one is the range). I think the Subject-centric
   approach is more complex to implement.=20

I wouldn't let my unification algorithm ever see a triple; it will be
happy with arbitrary atomic formulas.  You pay a one-time conversion
cost to get your notation from angly-brackety form to atomic formulas,
and after that it's irrelevant how the formulas were encoded in the
RDF world.

   I would like to summarize my approach and its benefits:

I'll interleave with my approach and its benefits

   - Queries/antecedants of Horn rules are represented as a RDF document (
   so no need for specific parser )

Rules and assertions are represented as predicate calculus.  If somone
asks for the RDF version we can generate it.

   - In this CONTEXT, the statements MUST BE CONSIDERED AS NON-ASSERTED (I
   think this is a very reasonable assumption).

Predicate calculus already does the right thing.  The RDF documents we
generate require no special "non-assertion" assumption.  Any RDF
processor can examine them and not make any faulty conclusions.
Some RDF processors will know how to extract the logical formulas back
out of them.

   - The model is mathematically correct (relation/operators are RDF
   properties: they have a domain and a range).

Predicate calculus is mathematically correct.

   - Simply extend current unification engine by adding built-ins
   properties (such as CWM).

Old and good idea, incorporated into most theorem provers.

   - Easy to publish supported operators of inference engine by publishing
   the ontology of operators.

Our RDF documents have an ontology, too.

   - Requires minimal change in RDF model (just adding variable concept).

We require no change at all in the RDF model.

Actually, you're going to need a bit more than you realize.  I think
you said so a while back, when you said that a triple in a rule was
non-asserted even if contained no variables.

   - Mappable to any query language

Are there query languages that are not mappable to some other query
languages? 

   - Easy to perform reasoning and refactor the query

I don't know what this means.

   - Each triple is independent and can be executed in the most adequate
   order (triple with mustBindVariable first for example).=20

These are standard query-optimization ideas, right?

   I hope I manage to communicate my thoughts. It is possible that I am
   missing something very fundamental in my reasoning, because I do not
   have a strong AI background. If this approach does not work, I really
   would like to know why. Thank for your patience.  =20

It will work fine.  I don't know why this idea wasn't adopted years
ago, but those who have proposed it have met with serious resistance.
It seems the winds are shifting, though.  Just as garbage collection
went overnight from being a crutch for sissies to an obviously good
idea, the notion of marking some triples as non-asserted may be about
to be declared an obvious idea that the W3C always intended to toss
into one of its layers someday.

By the way, I don't know that AI is really relevant here.  It seems
like what people are really excited about nowadays is reinventing
database theory, not AI.

-- 
                                             -- Drew McDermott
                                                Yale University CS Dept.
Received on Friday, 7 November 2003 13:42:48 UTC