NP completeness & rdf entailment, graph identity, MT etc.

This message is a bit muddled, sorry in advance.

The main points are:
+ rdf entailment is NP complete
+ finding a canonical graph from a given RDF graph is NP complete
+ equivalence of RDF graphs is NP complete

========================


We have this thread about whether a graph is a set or a bag.

http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2001Oct/thread.html#61

I fear it is the tip of an iceberg.

My understanding is that we have:

two concrete syntaxes: RDF/XML and N-triple
one abstract syntax:   a graph (see MT).
one model:             see Pat's MT

('one' in "one model" means "one model theoretic articulation of RDF").

So the "graphs are sets?!" thread was triggered by my observation that an
ntriple file with two identical triples

A:

<uri> <pred> <uri2> .
<uri> <pred> <uri2> .

was equivalent to one with only one:

B:

<uri> <pred> <uri2> .


This was intended as a harmless sort of comment :-(.
It was motivated by the graph abstract syntax.

Under the model theory, we would say that two graphs are equivalent if they
mutually entail one another.
This would mean - wait for it Aaron :) - that

C:

<uri> <pred> <uri2> .
<uri> <pred> _:bnode .

was another version of the same RDF model (if not the same RDF graph!).
From a model theoretic point of view A B and C are all equivalent.

This follows from the bnode being taken as an existentially qualified
variable.

I assume that it would, in the fullness of time, be desirable to determine a
canonical RDF graph for any particular equivalence class of RDF graphs, for
instance, the equivalence class containing graphs corresponding to A, B, and
C would have canonical form B. This paragraph refers to the graph not to a
serialization of the graph. The canonicalization of the serialization of a
graph is known to be graph isomorphism complete.


If we consider an extension of C being:

D:

<uri> <pred> <uri2> .
<uri> <pred> _:bnode .
_:bnode <pred> _:bnode2 .

then we can no longer ignore the second triple, since it is saying something
more substantive than the second triple in C.


=================

Observations:

rdf entailment as stated in the model theory requires in the worst case an
NP complete computation (proof: an arbitrary subgraph isomorphism can be
encoded in rdf entailment - but not in RDF/XML).

rdf graph canonicalization requires an NP complete computation. This is
intended to mean solely at the level of the graph the abstract syntax,
before we serialize the graph itself. Canonical graph serialization is graph
isomorphism complete.

two rdf graphs are equivalent if they entail one another. this also is NP
complete - i.e. it is (thought to be) worse than graph isomorphism.

==============

I am unclear as to whether these are theoretical difficulties that can
safely be ignored; or real practical difficulties that should be addressed
by fundamental changes to the model theory.

Jeremy

Received on Monday, 8 October 2001 07:27:48 UTC