RE: On Parameters from Orri Erling on 2009-03-25 (public-rdf-dawg@w3.org from January to March 2009)

From: Orri Erling <erling@xs4all.nl>
Date: Wed, 25 Mar 2009 15:59:56 +0100
To: "'Steve Harris'" <steve.harris@garlik.com>, "'SPARQL Working Group'" <public-rdf-dawg@w3.org>
Message-Id: <200903251500.n2PF0D0Y058873@smtp-vbr5.xs4all.nl>

-----Original Message-----
From: public-rdf-dawg-request@w3.org [mailto:public-rdf-dawg-request@w3.org]
On Behalf Of Steve Harris
Sent: Wednesday, March 25, 2009 1:03 PM
To: SPARQL Working Group
Subject: Re: On Parameters

First of all, I think that parameterised queries are a good idea in  
general, though really only to provide reliable escaping - something  
that SPARQL client libraries could do just as well, as in early ODBC  
implementations.

....

Steve

1.  I'd say the result format is a matter for content negotiation and if
not, then the format should be a literal.  The  implementation may produce
different query compilations depending on format.  Making this a run time
setting instead of a compile time one is not an issue either.

2. The argument types should be stated similarly to the XML result set
format.  This would be a message body for POST.  If it is in the GET, then
the xsd notation could do.  We accept the xsd notation in the GET.

3. Stored procedures are quite necessary for online applications.  But if
the WG will not agree on parameters which are a comparatively
straightforward thing, it is manifestly impossible to agree on stored
procedures or optimizer hints.  So, procedures are necesssary but
implementations will each deal with this as they may.

But procedures cannot be used for federation, where, if joining across end
points, we expect to have large batches of queries differing only in
literals.  These may be only one or two triple patterns long, which in turn
means that the compile time is quite significant as opposed to the few
microseconds it takes to find a triple.  And of course, if you have a server
running on a cluster, you cannot send single triple lookups over the
interconnect without dying of latency.  So, for SQL clients, which have
array parameters, if you do a select sum (l_extendedprice) from lineitem
where l_orderkey = ?, and the table is partitioned on l_orderkey, we
partition each of the parameter  rows first, then send them to their
respective partitions, and gather and order the results att the end and then
send them  to the SQL client.  This absorbs the easily 100-200 microsecond
latency of a cluster round trip and gives reasonable platform utilization.

On the RDF side, the case where such things will occur is federation.  In
this situation, the end point should read all the incoming, determine that
these are the same query, compile it once, then do the equivalent of the
above.  It can be done without parameters  in the protocol if the end point
is smart enough but parameters make this explicit.

4.  The { ?s ?p ?o } query is indeed pathological.  But then, if the user
knows about parameters, the user ought also to know a little about where
they fit.  

Federation may be the most important use case.  The second would be showing
something like a dashboard on a login page, a relatively simple query that
repeats all the time for different users.  But the latter may well end up as
a stored procedure regardless.

I would say that this is engineering as opposed to research.

Orri

Received on Wednesday, 25 March 2009 15:10:21 UTC