Re: RDF-Entailment: Remove duplicate anonymous resources - looking for an algorithm

It might be useful at this point to formalize what the problem is.  Reto 
said earlier a very general version of the problem: find for a graph g 
"the smallest graph g1, so that g entails g1 and g1 entails g."

I think this might be too general unless we add that g1 is a subset of 
g.  (Otherwise, consider that I can write a single statement and then 
define the semantics of that statement outside of RDF to entail any graph.)

So the general problem then is for graph g, find non-empty graphs a and 
b such that a union b = g and a entails b.  Then, b can be removed.  But 
splitting a graph into two parts is tricky because of blank nodes -- 
each blank node can appear in either a or b, but not both.  For every b, 
for every blank node in b, every statement in g mentioning that node 
must be in b.  (Someone named something like this a 'minimal 
self-contained graph', MSG.)

Ignoring any type of inferencing and since a and b don't have any 
statements in common, a can only entail b if b is a subgraph of a but 
with one or more nodes in each statement replaced by other blank nodes 
from g -- b is just a weaker version of some part of a.  And, b must 
contain blank nodes.  (Okay, so you knew this already.)

Finally, a crude algorithm would be to find all subgraphs b of g such 
that 1) every statement in b has a blank node, 2) b contains all 
statements in g mentioning blank nodes in b (so there is at most one b 
for every blank node), 3) (g-b) entails b, which is to say that there is 
a mapping M from blank nodes in b to nodes not in b such that (g-b) 
contains M(b).

That's all off the top of my head... Could be wrong.  And as for an 
efficient algorithm, I have no idea...

-- 
- Joshua Tauberer

http://taubz.for.net

** Nothing Unreal Exists **

Received on Monday, 21 November 2005 14:08:41 UTC