- From: Orri Erling <erling@xs4all.nl>
- Date: Wed, 25 Mar 2009 15:59:56 +0100
- To: "'Steve Harris'" <steve.harris@garlik.com>, "'SPARQL Working Group'" <public-rdf-dawg@w3.org>
-----Original Message----- From: public-rdf-dawg-request@w3.org [mailto:public-rdf-dawg-request@w3.org] On Behalf Of Steve Harris Sent: Wednesday, March 25, 2009 1:03 PM To: SPARQL Working Group Subject: Re: On Parameters First of all, I think that parameterised queries are a good idea in general, though really only to provide reliable escaping - something that SPARQL client libraries could do just as well, as in early ODBC implementations. .... Steve 1. I'd say the result format is a matter for content negotiation and if not, then the format should be a literal. The implementation may produce different query compilations depending on format. Making this a run time setting instead of a compile time one is not an issue either. 2. The argument types should be stated similarly to the XML result set format. This would be a message body for POST. If it is in the GET, then the xsd notation could do. We accept the xsd notation in the GET. 3. Stored procedures are quite necessary for online applications. But if the WG will not agree on parameters which are a comparatively straightforward thing, it is manifestly impossible to agree on stored procedures or optimizer hints. So, procedures are necesssary but implementations will each deal with this as they may. But procedures cannot be used for federation, where, if joining across end points, we expect to have large batches of queries differing only in literals. These may be only one or two triple patterns long, which in turn means that the compile time is quite significant as opposed to the few microseconds it takes to find a triple. And of course, if you have a server running on a cluster, you cannot send single triple lookups over the interconnect without dying of latency. So, for SQL clients, which have array parameters, if you do a select sum (l_extendedprice) from lineitem where l_orderkey = ?, and the table is partitioned on l_orderkey, we partition each of the parameter rows first, then send them to their respective partitions, and gather and order the results att the end and then send them to the SQL client. This absorbs the easily 100-200 microsecond latency of a cluster round trip and gives reasonable platform utilization. On the RDF side, the case where such things will occur is federation. In this situation, the end point should read all the incoming, determine that these are the same query, compile it once, then do the equivalent of the above. It can be done without parameters in the protocol if the end point is smart enough but parameters make this explicit. 4. The { ?s ?p ?o } query is indeed pathological. But then, if the user knows about parameters, the user ought also to know a little about where they fit. Federation may be the most important use case. The second would be showing something like a dashboard on a login page, a relatively simple query that repeats all the time for different users. But the latter may well end up as a stored procedure regardless. I would say that this is engineering as opposed to research. Orri
Received on Wednesday, 25 March 2009 15:10:21 UTC