Re: NP completeness & rdf entailment, graph identity, MT etc. from Pat Hayes on 2001-10-08 (w3c-rdfcore-wg@w3.org from October 2001)

From: Pat Hayes <phayes@ai.uwf.edu>
Date: Mon, 8 Oct 2001 16:04:34 -0500
To: "Jeremy Carroll" <jjc@hplb.hpl.hp.com>
Cc: w3c-rdfcore-wg@w3.org
Message-Id: <p0510100db7e7bff41be0@[205.160.76.193]>
>This message is a bit muddled, sorry in advance.
>
>The main points are:
>+ rdf entailment is NP complete
>+ finding a canonical graph from a given RDF graph is NP complete
>+ equivalence of RDF graphs is NP complete

Presume you mean semantic equivalence, ie A equivalent B means: A 
entails B and B entails A (?)

>========================
>
>
>We have this thread about whether a graph is a set or a bag.
>
>http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2001Oct/thread.html#61
>
>I fear it is the tip of an iceberg.
>
>My understanding is that we have:
>
>two concrete syntaxes: RDF/XML and N-triple
>one abstract syntax:   a graph (see MT).
>one model:             see Pat's MT
>
>('one' in "one model" means "one model theoretic articulation of RDF").

ie one notion of entailment, might be better way to put it.

>
>So the "graphs are sets?!" thread was triggered by my observation that an
>ntriple file with two identical triples
>
>A:
>
><uri> <pred> <uri2> .
><uri> <pred> <uri2> .
>
>was equivalent to one with only one:
>
>B:
>
><uri> <pred> <uri2> .

They are certainly equivalent in the sense that each entails the 
other. I thought what we were discussing was whether we even want to 
allow the first one as a well-formed graph.

>
>This was intended as a harmless sort of comment :-(.
>It was motivated by the graph abstract syntax.
>
>Under the model theory, we would say that two graphs are equivalent if they
>mutually entail one another.
>This would mean - wait for it Aaron :) - that
>
>C:
>
><uri> <pred> <uri2> .
><uri> <pred> _:bnode .
>
>was another version of the same RDF model (if not the same RDF graph!).

No, don't say THAT. We can't possibly only have one version of every 
two equivalent graphs. Well, maybe we could, but I don't think it 
would be a good idea, because as soon as the language gets slightly 
more expressive it will be genuinely impossible. It might work for 
RDF but only because pure RDF is so extremely simple.

>From a model theoretic point of view A B and C are all equivalent.
>
>This follows from the bnode being taken as an existentially qualified
>variable.

Right. On the other hand, if we wanted to go this way, a rational 
argument could be made for making C *syntactically* illegal on the 
same grounds that A would be, viz. it contains redundant information. 
This is one way reason I'd rather not go in this direction; lets keep 
syntactic identity/indistinguishability separate from semantic 
equivalence.

>
>I assume that it would, in the fullness of time, be desirable to determine a
>canonical RDF graph for any particular equivalence class of RDF graphs, for
>instance, the equivalence class containing graphs corresponding to A, B, and
>C would have canonical form B.

Well, Im not sure. It might be worth trying to do that, but bear in 
mind that it wouldn't generalise very far (eg not even to RDFS, at 
least not without a lot of extra work; eg there are any number of 
different graphs that all have the same rdfs-closure. There might be 
a 'minimal' one, I guess, though I havnt thought about how to prove 
it. ).

>This paragraph refers to the graph not to a
>serialization of the graph. The canonicalization of the serialization of a
>graph is known to be graph isomorphism complete.
>
>
>If we consider an extension of C being:
>
>D:
>
><uri> <pred> <uri2> .
><uri> <pred> _:bnode .
>_:bnode <pred> _:bnode2 .
>
>then we can no longer ignore the second triple, since it is saying something
>more substantive than the second triple in C.

Right. B entails C, but not D.

>
>=================
>
>Observations:
>
>rdf entailment as stated in the model theory requires in the worst case an
>NP complete computation (proof: an arbitrary subgraph isomorphism can be
>encoded in rdf entailment - but not in RDF/XML).

>rdf graph canonicalization requires an NP complete computation. This is
>intended to mean solely at the level of the graph the abstract syntax,
>before we serialize the graph itself. Canonical graph serialization is graph
>isomorphism complete.
>
>two rdf graphs are equivalent if they entail one another. this also is NP
>complete - i.e. it is (thought to be) worse than graph isomorphism.
>
>==============
>
>I am unclear as to whether these are theoretical difficulties that can
>safely be ignored; or real practical difficulties that should be addressed
>by fundamental changes to the model theory.

I don't think that they are too surprising, and that we shouldnt 
*worry* about them. There is no way to get past them in any case by 
changing the model theory; we would have to abandon anonymous nodes 
altogether in order to get rid of the subgraph-isomorphism property. 
Even if we used Ntriples documents with bNodes, there is always the 
possibility of permuting the bNode labels, and that gives us the 
leeway to encode graph isomorphism. .

Pat

-- 
---------------------------------------------------------------------
IHMC					(850)434 8903   home
40 South Alcaniz St.			(850)202 4416   office
Pensacola,  FL 32501			(850)202 4440   fax
phayes@ai.uwf.edu 
http://www.coginst.uwf.edu/~phayes
Received on Monday, 8 October 2001 17:04:41 UTC