rambling about safety Was: Protocol/query interaction - computational cost of a query from Eric Prud'hommeaux on 2004-05-03 (public-rdf-dawg@w3.org from April to June 2004)

From: Eric Prud'hommeaux <eric@w3.org>
Date: Mon, 3 May 2004 09:35:24 +0900
To: "Seaborne, Andy" <andy.seaborne@hp.com>
Cc: RDF Data Access Working Group <public-rdf-dawg@w3.org>
Message-ID: <20040503003524.GA9495@w3.org>

On Thu, Apr 29, 2004 at 01:15:16PM +0100, Seaborne, Andy wrote:
> 
> 
> One of the things the comparison of query languages talks about is "safety".
> There is an interaction with remote access here - does the query language
> allow or encourage queries which do not terminate or incur excessive
> computation?

[[
- Safety A query language is considered safe, if every query that is
  syntactically correct returns a finite set of results (in a finite
  data set). Typical concepts that cause query languages to be unsafe
  are recursion, negation, and built-in functions.
]] - http://www.aifb.uni-karlsruhe.de/WBS/pha/rdf-query/rdfquery.pdf
-- hand-transcribed from the original document because I can't
   cut/paste PDFs.

It seems like the designation as "safe" really indicates the lack of
the contraindications such as recursion, which seems to mean "support
for transitive properties". I believe we are chartered to produce a
language that can be used for ground facts or inferences without any
syntactic distinction.

[[
1.8 Derived Graphs

The working group must recognize that RDF graphs are often constructed
by aggregation from multiple sources and through logical inference,
and that sometimes the graphs are never materialized. Such graphs may
be arbitrarily large or infinite.
]] - http://www.w3.org/2003/12/swa/dawg-charter#derivedGraphs

> In sending a query to a remote server, we should be aware we are sending
> around a computation.  A query language that can cause infinite, or
> excessive, computation at the server is not a good idea.  A query could be
> analysable as to be being safe or the language could be designed so that it
> is not possible, or probably not natural, to express unsafe constructs.

I'm skeptical that the ability to query the product of large or
infinite inferences is worse than the ability to underconstrain
graph queries.

> I don't like the idea that "DAWG-QL" needs to be analysable; I would prefer
> that the language does not have features that allow excessive or infinite
> computation unless that feature is really important.
> 
> This isn't a clear cut matter - if you ask a query that is not very
> selective of a large dataset, you will get back a lot of results and it may
> consume a lot of system resources.  But we can design things to avoid traps
> for the application writer.

I added an underconstraints checker to algae which warns you when you
underconstrain up a query. (It also keeps you from doing cross
products.) I did not bother to add any syntax to indicate whether the
client *wanted* this check done. Consequently, there was no syntactic
affect from this feature.

Perhaps the mythic *extensibility* feature could be uesd to communicate
underconstraints and inference-limiting flags.

> 	Andy
> 
> PS I like the phrase "RDQL is safe" which reminds me of the Hitch-Hikers
> Guide to the Galaxy: "mostly harmless".  Not sure that translates to other
> parts of the globe.

-- 
-eric

office: +1.617.258.5741 NE43-344, MIT, Cambridge, MA 02144 USA
cell:   +1.857.222.5741

(eric@w3.org)
Feel free to forward this message to any list for any purpose other than
email address distribution.

Received on Sunday, 2 May 2004 20:35:49 UTC