Comments about the complexity of evaluating property paths from Marcelo Arenas on 2012-02-15 (public-rdf-dawg@w3.org from January to March 2012)

From: Marcelo Arenas <marcelo.arenas1@gmail.com>
Date: Wed, 15 Feb 2012 09:50:33 -0300
To: SPARQL Working Group <public-rdf-dawg@w3.org>
Message-ID: <CA+7aXj7Em6Vp7gFLTWirVQPgP--vHxCytVE_+bxx8+ciQ-MvMQ@mail.gmail.com>

Dear All,

It has been mentioned in some emails that the evaluation of property
paths over large cliques is problematic. I would like to point out
that in our paper:

http://www.dcc.uchile.cl/~jperez/sub-www-ext.pdf

we use cliques to provide a double-exponential lower bound for the
time complexity of the function ALP (used to define the semantics of
property paths of the form (:P)* in
http://www.w3.org/TR/sparql11-query). But in our paper we also show
that neither cliques nor large graphs are necessary to obtain
problematic cases:

- In the the foaf experiment shown in Section 2.3, we consider
strongly connected components constructed from real data (they were
constructed by starting from Axel's foaf document and then retrieving
friends, friends of friends, etc. following foaf:knows links). In this
experiment, no query engine was able to handle a small strongly
connected component of 15KB for a query with a single property path
(foaf:knows)*. Notice that much larger strongly connected components
are likely to occur in real data.

- In Section 5.1, we show that for some small queries, the evaluation
of property paths even for cliques with 4 nodes is infeasible. Again,
notice that in practice larger cliques are likely to occur.

I would also like to point out that the #P-completeness in data
complexity of the evaluation of property paths of the form (:P)* (see
Theorem 5.1) is not proved by considering cliques.

Cheers,

Marcelo Arenas

Received on Wednesday, 15 February 2012 12:51:04 UTC