RDF graph query, graph equivalence and inference from Graham Klyne on 2003-06-26 (www-archive@w3.org from June 2003)

From: Graham Klyne <gk@ninebynine.org>
Date: Thu, 26 Jun 2003 12:01:49 +0100
To: Jeremy Carroll <jjc@hpl.hp.com>, Dave Reynolds <der@hplb.hpl.hp.com>
Cc: www-archive <www-archive@w3.org>, Brian McBride <bwm@hplb.hpl.hp.com>
Message-Id: <5.1.0.14.2.20030626111221.00bb0fb8@127.0.0.1>

Jeremy,
Dave,

I'm fishing for some sanity checks.

I'm up to my elbows in implementing inference primitives on RDF 
graphs.  What I'm aiming for is a modular set of tools that I can use to 
construct inference patterns, using Haskell as a kind of 'glue' or 
scripting language.  I'm trying to reach the parts that CWM can't reach, 
and aim to use this to create domain-specific inference tools (e.g. for 
continuing my network configuration work).  Datatype reasoning is one of my 
targets (e.g. having a specific datatype for IP network addresses).

So far, my effort has been to implement forward- and backward- chaining 
based on graph query and substitution primitives, and that seems to be 
working fine.

But simple chaining techniques don't seem to handle graph instance 
inference very well (cf. the RDF interpolation lemma).  Jeremy, in our 
discussion of graph isomorphism, you said something that seemed to suggest 
that, in general, that kind of entailment was a "hard problem" to test, and 
am trying to understand the extent to which that applies here.  Forward 
chaining to introduce bnodes seems to be rather unbounded, and backward 
chaining to ground graphs seems to have similar problems.

My current strategy is to introduce a variant of my graph forward-chaining 
query function that matches a query graph with bnodes against a target 
graph, and use that to match the outputs from backward chaining against 
input data.  In general, I suspect that this query potentially suffers from 
some computational tractability problems:  I think it's O(N*M) where N and 
M are the size of query and target graphs respectively -- matching each 
triple of the query graph against each triple of the target graph -- but 
I'm also thinking that this should be manageable given that the query 
graphs will usually be quite small (I hope:  backward chaining seems to 
generate a number of small antecedent graphs that can be explored 
separately), and that the target graph will not contain bnodes (or if it 
does, they are not treated as variables by the query).

So, some sanity checks I'd like to float are:

(1) does my query strategy fully reflect RDF simple entailment?  I think 
so, in that it should (if the query function is correct) find all 
substitutions of bnodes that make the query graph a subgraph of the target 
graph, which I think corresponds to the interpolation lemma (an instance 
being a substitution of bnodes).

(2) Jeremy, does the computational difficulty of this query process 
correspond to the computational difficulty of subgraph isomorphism?  I 
think it's simpler because of not treating any of the target graph nodes as 
variables.

(3) Dave, you mentioned that you've implemented a hybrid reasoner for 
Jena.  Does the combination of backward chaining and forward query 
described above seem comprable, or have you a different strategy?

Any thoughts?

#g


-------------------
Graham Klyne
<GK@NineByNine.org>
PGP: 0FAA 69FF C083 000B A2E9  A131 01B9 1C7A DBCA CB5E

Received on Thursday, 26 June 2003 07:08:43 UTC